Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 8 years ago.
Improve this question
I have a wordlist containing numbers, English Words, and Bengali words in a column and in other column I have their frequencies. These columns have no headers. I need the words with frequencies between 5- 300. This is the code I am using. It is not working.
wordlist = open('C:\\Python27\\bengali_wordlist_full.txt', 'r').read().decode('string-escape').decode("utf-8")
for word in wordlist:
if word[1] >= 3
print(word[0])
elif word[1] <= 300
print(word[0])
This is giving me a syntax error.
File "<stdin>", line 2
if word[1] >= 3
^
SyntaxError: invalid syntax
Can anyone please help?
You should add : after your if statements to fix this SyntaxError:
wordlist = open('C:\\Python27\\bengali_wordlist_full.txt', 'r').read().decode('string-escape').decode("utf-8")
for word in wordlist:
if word[1] >= 3:
print word[0]
elif word[1] <= 300:
print word[0]
Read this:
https://docs.python.org/2/tutorial/controlflow.html
Also here it is one useful tip: when python gives you SyntaxError for some line, always look at the previous line, then at the following one.
There are few problems with your code, I add full explanation in an hour and so. See how it should look like and consult docs in the meantime:
First, it is safer to use with open() clause for opening files (see https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects)
filepath = 'C:/Python27/bengali_wordlist_full.txt'
with open(filepath) as f:
content = f.read().decode('string-escape').decode("utf-8")
# do you really need all of this decdcoding?
Now content holds text from file: this is one, long string, with '\n' characters to mark endlines. We can split it to list of lines:
lines = content.splitlines()
and parse one line at the time:
for line in lines:
try:
# split line into items, assign first to 'word', second to 'freq'
word, freq = line.split('\t') # assuming you have tab as separator
freq = float(freq) # we need to convert second item to numeric value from string
if 5 <= freq <= 300: # you can 'chain' comparisons like this
print word
except ValueError:
# this happens if split() gives more than two items or float() fails
print "Could not parse this line:", line
continue
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
**im a newbie here. Im kinda working on a program that counts words that starts with capital letter, per line inside a csv file using python. I used regex but i think it doesnt work. Here is the sample code that i made but unfortunately it doesnt give the output that i want. hope you could help me.
**
import re
line_details = []
result = []
count = 0
total_lines = 0
class CapitalW(): #F8 Word that starts with capital letter count
fh = open(r'20items.csv', "r", encoding = "ISO-8859-1").read()
#next(fh)
for line in fh.split("n"):
total_lines += 1
for line in re.findall('[A-Z]+[a-z]+$', fh):
count+=1
line_details.append("Line %d has %d Words that start with capital letter" %
(total_lines, count))
for line in line_details:
result7 = line
print (result7)
**- result should be as follows:
Line 1 has 2 Words that start with capital letter
Line 2 has 5 Words that start with capital letter
Line 3 has 1 Words that start with capital letter
Line 4 has 10 Words that start with capital letter**
In the regex you doens't need the $ character beacause [A-Z]+[a-z]+$ matches only if there is one word in the line. So [A-Z]+[a-z]+ instead.
The other, is, that I see from the encoding, that you maybe use characters what are not between a-z for example é. So you maybe have to add these also to the pattern. [A-ZÉÖ]+[a-zéö]+ and add all the other special characters.
Assuming a fixed indentation and in addition to matebende's answer, these are the required further corrections:
for line in fh.split("n"): is supposed to be for line in fh.split("\n"):.
The initialization count = 0 has to be inside this for loop.
The fh in for line in re.findall('[A-Z]+[a-z]+$', fh): is wrong and has to be line.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I have a function generating random character [a-Z0-9] with whitespaces and appending each character to list:
words = []
while words.count(' ') < 10:
if len(words) == 0:
# append character
else:
if words[-1].isdigit(): # checking if last character is digit
# append only digit or whitespacce
else:
# append character
As you can see if last (previous) character was digit, I try to append digit or whitespace only, otherwhise append any character. The problem is, when I run the code I get error below:
'NoneType' object has no attribute 'isdigit' for line if words[-1].isdigit():. What I do wrong and why there is None instead of str?
Just make words string:
(It's biased towards chars, because whitespace is not digit, and hence it can turn to chars from digits, but not other way around- which is what you wanted, if I got you right)
import random
sample_digit=" 0123456789"
sample_char=" abcdefghi"
sample_any=sample_digit+sample_char
words = ""
while words.count(' ') <10:
if len(words) == 0:
words+=sample_any[random.randint(0, len(sample_any)-1)]
else:
if words[-1].isdigit():
words+=sample_digit[random.randint(0, len(sample_digit)-1)]
else:
words+=sample_char[random.randint(0, len(sample_char)-1)]
print(words)
A simple way of testing whether a character is a digit:
if words[-1] in '0123456789':
...
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 8 years ago.
Improve this question
I have a python string called line that I've split. line is a recurring string. I'm searching through an excel file and printing out each line that contains a specific word, i'll call it search which is a term that the user inputs. If the line doesn't contain search then it doesn't get printed.
I split the line, and printed out the search_index (index of the search term in the line).
s=line.split()
search_index = s.index(search) if inflected in s else "not in this line"
print(search_index)
If it doesn't exist in the line then the log will say "not in this line" instead of a number since it was crashing whe nI didn't include that.
What I awnt to do is join this split back together, but from a range with the searched term being teh middle. So, something like
new_line=[search_index - 5:search_index + 5]
but not sure if that's right since it gives me an error on the webpage of "syntax invalid"
How should this be properly done?
I think you have a typo (missing line before your range [:]) but there's another thing as well. If your search_index has been assigned a string, you can't subtract or add 5 to it.
I'm not sure of the context so you'll have to tweak this to your needs but this addresses those issues:
s=line.split()
if inflected in s:
search_index = s.index(search)
new_line = line[search_index-5:search_index+5]
else:
print("not in this line")
When you get the attribute of a list, you always have to put the name of the list before how you are calling it:
>>> line = 'hello world!'
>>> search_index = 3
>>> [search_index-3:search_index+3]
File "<stdin>", line 1
[search_index-3:search_index+3]
^
SyntaxError: invalid syntax
>>> line[search_index-3:search_index+3]
'hello '
>>>
Therefore, instead of new_line = [search_index-5:search_index+5], use new_line = line[search_index-5:search_index+5].
Here is another example:
>>> line = 'Hello this is django on python'
>>> line = line.split()
>>> search_index = line.index('django')
>>> new_line = [search_index - 2:search_index + 2]
File "<stdin>", line 1
new_line = [search_index - 2:search_index + 2]
^
SyntaxError: invalid syntax
>>> new_line = line[search_index - 2:search_index + 2]
>>> new_line
['this', 'is', 'django', 'on']
>>>
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 months ago.
Improve this question
I have this code:
mylist = open('sortedwords.txt')
txt = mylist.read()
mylist = txt.split()
stuff = input('Type a word here: ')
def removeletters (word, Analysis):
for char in range (len(Analysis)):
if Analysis [char] in word:
word = word.replace(Analysis[char],"",1)
return word
def anagramSubset(word, textList):
newWord = word
for char in range(len(textList)):
if textList[char] not in newWord:
return False
else:
newWord = newWord.replace(textList[char],"",1)
return True
def anagram(word, textList):
savedWords =[]
for checkword in textList:
if len(word) == len(checkword) and anagramSubset(word, checkword):
savedWords.append(checkword)
print(checkword)
anagram(stuff, mylist)
It is supposed to take an input word, remove letters from the input word, then make a subset of words and save that to an array to print off of.
The problem is that the code will save every word that can be created from the input. E.g. an input of spot results in top, tops, stop, pots, pot, etc. The result should only have tops, pots, and stop.
What is wrong with the code, and how do I fix it?
I looked at the code and am wondering what the recursion is adding? The first pass does all of the computational work and then the recursion adds some extra stack frames and alters how output is printed. Am I making the wrong assumption that textList is a list of valid words split from a single line in a file?
When I run this locally with a particular word list, this gets the same effect (in the sense that it finds words whose letters are a subset) with less thrashing:
def anagram(word, textList):
savedWords = []
for checkword in textList:
if anagramSubset(word, checkword):
savedWords.append(checkword)
print(savedWords)
If the problem eventually becomes that you're getting words that have too few letters, you could fix your problem by checking that a word is the length of the original word before you add it with:
if len(original_word) == len(checkword):
savedWords.append(checkword)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I need to make a program in python which looks through a given file. Let's say acronyms.txt, and then returns a percentage value of how many lines contain at least 1 three letter acronym.
For example:
NSW is a very large state.
It's bigger than TAS.
but WA is the biggest!
After reading this it should return 66.7% as 66.7% of the lines contain a three letter acronym. It is also rounded to the first decimal place as you can see. I am not very familiar with regex but I think it would be simplest with regex.
EDIT:
I have finished the code but i need it to recognize acronyms with dots between them, EG N.S.W should be recognized as an acronym. How do i do this?
Any help would be appreciated!
You can do:
import re
cnt = 0
with open('acronyms.txt') as myfile:
lines = myfile.readlines()
length = len(lines)
for line in lines:
if re.search(r'\b[A-Z]{3}\b', line) is not None:
cnt += 1
print("{:.1f}%".format(cnt/length*100))
r'[A-Z]{3}' matches three (and only three) capital letters in a row. If a search is found, then we add a count.
Then we simply do the count divided by the length of lines, and print the result as you have shown.
You can do something like:
total_lines = 0
matched_lines = 0
for line in open("filename"):
total_lines += 1
matched_lines += bool(re.search(r"\b[A-Z]{3}\b", line))
print "%f%%" % (float(matched_lines) / total_lines * 100)
Note '\b' in search pattern -- it matches empty string in beginning or end of word. It helps you to prevent unwanted matches with acronyms longer than 3 ('asdf ASDF asdf') or with acronyms inside word ('asdfASDasdf').