Writing a hangman program in python and I've come across a problem when passing in a file that has multi word strings and single word strings.
FILE:
hello brown fox
dog
cat
water
jump
#initialize list
wordList = []
# get and open file
getFile = raw_input("Enter file name: ")
filename = open(getFile, "r")
def readWords(filename):
for line in filename:
# split any multi word line
line.split()
# add line to wordList
wordList.append(line)
Yet the out put for wordList still reads:
wordList = ['hello brown fox\n', 'dog\n', 'cat\n', 'water\n', 'jump\n']
I am trying to make it so that 'hello brown fox' appears as 3 separate strings.
You are making this too complicated - just .split the entire file contents:
with open(getFile, "r") as f:
words = f.read().split()
the problem you're having is that you're splitting but not saving the splitted line:
>>> a = "hello brown fox"
>>> a.split()
['hello', 'brown', 'fox']
>>> a
'hello brown fox'
>>>
so:
wordList.extend(line.split())
should do the trick for you
The split function returns the result as a list, so appending it directly would probably not what you desire. You can try the following example:
def main():
l_wordList = ['hello brown fox\n', 'dog\n', 'cat\n', 'water\n', 'jump\n']
l_words_list = []
l_word = ''
for word in l_wordList:
if isinstance(word.split(), list):
for token in word.split():
l_words_list.append(token)
else:
l_words_list.append(word)
for word in l_words_list:
print(word)
main()
And the result will be this
>>>
hello
brown
fox
dog
cat
water
jump
>>>
Regards,
Dariyoosh
Related
Not sure how to remove the "\n" thing at the end of output
Basically, i have this txt file with sentences such as:
"What does Bessie say I have done?" I asked.
"Jane, I don't like cavillers or questioners; besides, there is something truly forbidding in a child
taking up her elders in that manner.
Be seated somewhere; and until you can speak pleasantly, remain silent."
I managed to split the sentences by semicolon with code:
import re
with open("testing.txt") as file:
read_file = file.readlines()
for i, word in enumerate(read_file):
low = word.lower()
re.split(';',low)
But not sure how to count the words of the split sentences as len() doesn't work:
The output of the sentences:
['"what does bessie say i have done?" i asked.\n']
['"jane, i don\'t like cavillers or questioners', ' besides, there is something truly forbidding in a
child taking up her elders in that manner.\n']
['be seated somewhere', ' and until you can speak pleasantly, remain silent."\n']
The third sentence for example, i am trying to count the 3 words at left and 8 words at right.
Thanks for reading!
The number of words is the number of spaces plus one:
e.g.
Two spaces, three words:
World is wonderful
Code:
import re
import string
lines = []
with open('file.txt', 'r') as f:
lines = f.readlines()
DELIMETER = ';'
word_count = []
for i, sentence in enumerate(lines):
# Remove empty sentance
if not sentence.strip():
continue
# Remove punctuation besides our delimiter ';'
sentence = sentence.translate(str.maketrans('', '', string.punctuation.replace(DELIMETER, '')))
# Split by our delimeter
splitted = re.split(DELIMETER, sentence)
# The number of words is the number of spaces plus one
word_count.append([1 + x.strip().count(' ') for x in splitted])
# [[9], [7, 9], [7], [3, 8]]
print(word_count)
Use str.rstrip('\n') to remove the \n at the end of each sentence.
To count the words in a sentence, you can use len(sentence.split(' '))
To transform a list of sentences into a list of counts, you can use the map function.
So here it is:
import re
with open("testing.txt") as file:
for i, line in enumerate(file.readlines()):
# Ignore empty lines
if line.strip(' ') != '\n':
line = line.lower()
# Split by semicolons
parts = re.split(';', line)
print("SENTENCES:", parts)
counts = list(map(lambda part: len(part.split()), parts))
print("COUNTS:", counts)
Outputs
SENTENCES: ['"what does bessie say i have done?" i asked.']
COUNTS: [9]
SENTENCES: ['"jane, i don\'t like cavillers or questioners', ' besides, there is something truly forbidding in a child ']
COUNTS: [7, 9]
SENTENCES: [' taking up her elders in that manner.']
COUNTS: [7]
SENTENCES: ['be seated somewhere', ' and until you can speak pleasantly, remain silent."']
COUNTS: [3, 8]
You'll need the library nltk
from nltk import sent_tokenize, word_tokenize
mytext = """I have a dog.
The dog is called Bob."""
for sent in sent_tokenize(mytext):
print(len(word_tokenize(sent)))
Output
5
6
Step by step explanation:
for sent in sent_tokenize(mytext):
print('Sentence >>>',sent)
print('List of words >>>',word_tokenize(sent))
print('Count words per sentence>>>', len(word_tokenize(sent)))
Output:
Sentence >>> I have a dog.
List of words >>> ['I', 'have', 'a', 'dog', '.']
Count words per sentence>>> 5
Sentence >>> The dog is called Bob.
List of words >>> ['The', 'dog', 'is', 'called', 'Bob', '.']
Count words per sentence>>> 6
`
import re
sentences = [] #empty list for storing result
with open('testtext.txt') as fileObj:
lines = [line.strip() for line in fileObj if line.strip()] #makin list of lines allready striped from '\n's
for line in lines:
sentences += re.split(';', line) #spliting lines by ';' and store result in sentences
for sentence in sentences:
print(sentence +' ' + str(len(sentence.split()))) #out
try this one:
import re
with open("testing.txt") as file:
read_file = file.readlines()
for i, word in enumerate(read_file):
low = word.lower()
low = low.strip()
low = low.replace('\n', '')
re.split(';',low)
I have a list of words (i.e.):
['bad', 'good', 'smart']
that I need to find which lines in a file that contain all those words within that line. I've tried:
for line in file_text:
if [w for w in list] in line:
print(line)
However I get the following error:
TypeError: 'in <string>' requires string as left operand, not list
What is an easy pythonic way of doing this?
I think you want to use all (or any):
>>> lst = ["the", "fox", "brown"]
>>> sentence = "the quick brown fox jumps over the lazy dog"
>>> all(word in sentence for word in lst)
True
One-Liner
for line in file_text:
[line for w in list if w in line and w == 'smart']
I am relatively new to python and have a question:
I am trying to write a script that will read a .txt file and check if words are in a list that I've provided and then return a count as to how many words were in that list.
So far,
import string
#this is just an example of a list
list = ['hi', 'how', 'are', 'you']
filename="hi.txt"
infile=open(filename, "r")
lines = infile.readlines()
for line in lines:
words = line.split()
for word in words:
word = word.strip(string.punctuation)
I've tried to split the file into lines and then the lines into words without punctuation.
I am not sure where to go after this. I would like ultimately for the output to be something like this:
"your file has x words that are in the list".
Thank you!
You can split your file to words using the following command :
words=reduce(lambda x,y:x+y,[line.split() for line in f])
Then count the number of words in your word list with loop over it and using count function :
w_list = ['hi', 'how', 'are', 'you']
with open("hi.txt", "r") as f :
words=reduce(lambda x,y:x+y,[line.split() for line in f])
for w in w_list:
print "your file has {} {}".format(words.count(w),w)
# words to search for;
# (stored as a set so `word in search_for` is O(1))
search_for = set(["hi", "how", "are", "you"])
# get search text
# (no need to split into lines)
with open("hi.txt") as inf:
text = inf.read().lower()
# create translation table
# - converts non-word chars to spaces (this maintains appropriate word-breaks)
# - keeps apostrophe (for words like "don't" or "couldn't")
trans = str.maketrans(
"0123456789abcdefghijklmnopqrstuvwxyz'!#$%&()*+,-./:;<=>?#[\\]^_`{|}~\"\\",
" abcdefghijklmnopqrstuvwxyz' "
)
# apply translation table and split into words
words = text.translate(trans).split()
# count desired words
word_count = sum(word in search_for for word in words)
# show result
print("your file has {} words that are in the list".format(word_count))
Read file content by with statement and open() method.
Remove punctuation from the file content by string module.
Split file content by split() method and iterate every word by for loop.
Check if word is present in the input list or not and increment count value according yo that.
input file: hi.txt
hi, how are you?
hi, how are you?
code:
import string
input_list = ['hi', 'how', 'are', 'you']
filename="hi.txt"
count = 0
with open(filename, "rb") as fp:
data = fp.read()
data = data.translate(string.maketrans("",""), string.punctuation)
for word in data.split():
if word in input_list:
count += 1
print "Total number of word present in file from the list are %d"%(count)
Output:
vivek#vivek:~/Desktop/stackoverflow$ python 18.py
Total number of word present in file from the list are 8
vivek#vivek:~/Desktop/stackoverflow$
Do not use variable names which already define by python interpreter
e.g. list in your code.
>>> list
<type 'list'>
>>> a = list([1,2,3])
>>> a
[1, 2, 3]
>>> list = ["hi", "how"]
>>> b = list([1,2,3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'list' object is not callable
>>>
use the len()
for example for a list;
enter code here
myList = ["c","b","a"]
len(myList)
it will return 3 meaning there are three items on your list.
This question already has answers here:
How to remove list of words from a list of strings
(4 answers)
Closed 6 years ago.
For example:
item =['the dog is gone', 'the dog and cat is gone']
words= ['dog','cat']
I want to be able to filter out the dog and cat so it would read:
item=['the is gone', 'the and is gone']
item1=[]
for w in words:
for line in item:
if w in line:
j=gg.replace(it,'')
item1.append(j)
I get the following:
['the is gone', 'the cat and is gone', 'the and dog is gone']
You're looping over all lines for each word and appending the replaces. You should switch those loops:
item1 = []
for line in item:
for w in words:
line = line.replace(w, '')
item1.append(line)
Note: I altered some code
changed gg to line
changed it to item
removed the check if line contains w as that is handled by replace
replace does not know about word boundries. If you want to remove entire words only, you should try a different approach. Using re.sub
import re
item1 = []
for line in item:
for w in words:
line = re.sub(r'\b%s\b' % w, '', line) # '\b' is a word boundry
item1.append(line)
You might use this approach instead:
item =['the dog is gone', 'the dog and cat is gone']
words= ['dog','cat']
item2 = [" ".join([w for w in t.split() if not w in words]) for t in item]
print item2
>>> ['the is gone', 'the and is gone']
I need to create a lexer/parser which deals with input data of variable length and structure.
Say I have a list of reserved keywords:
keyWordList = ['command1', 'command2', 'command3']
and a user input string:
userInput = 'The quick brown command1 fox jumped over command2 the lazy dog command 3'
userInputList = userInput.split()
How would I go about writing this function:
INPUT:
tokenize(userInputList, keyWordList)
OUTPUT:
[['The', 'quick', 'brown'], 'command1', ['fox', 'jumped', 'over'], 'command 2', ['the', 'lazy', 'dog'], 'command3']
I've written a tokenizer that can identify keywords, but have been unable to figure out an efficent way to embed groups of non-keywords into lists that are a level deeper.
RE solutions are welcome, but I would really like to see the underlying algorithm as I am probably going to extend the application to lists of other objects and not just strings.
Something like this:
def tokenize(lst, keywords):
cur = []
for x in lst:
if x in keywords:
yield cur
yield x
cur = []
else:
cur.append(x)
This returns a generator, so wrap your call in one to list.
That is easy to do with some regex:
>>> reg = r'(.+?)\s(%s)(?:\s|$)' % '|'.join(keyWordList)
>>> userInput = 'The quick brown command1 fox jumped over command2 the lazy dog command3'
>>> re.findall(reg, userInput)
[('The quick brown', 'command1'), ('fox jumped over', 'command2'), ('the lazy dog', 'command3')]
Now you just have to split the first element of each tuple.
For more than one level of deepness, regex may not be a good answer.
There are some nice parsers for you to choose on this page: http://wiki.python.org/moin/LanguageParsing
I think Lepl is a good one.
Try this:
keyWordList = ['command1', 'command2', 'command3']
userInput = 'The quick brown command1 fox jumped over command2 the lazy dog command3'
inputList = userInput.split()
def tokenize(userInputList, keyWordList):
keywords = set(keyWordList)
tokens, acc = [], []
for e in userInputList:
if e in keywords:
tokens.append(acc)
tokens.append(e)
acc = []
else:
acc.append(e)
if acc:
tokens.append(acc)
return tokens
tokenize(inputList, keyWordList)
> [['The', 'quick', 'brown'], 'command1', ['fox', 'jumped', 'over'], 'command2', ['the', 'lazy', 'dog'], 'command3']
Or have a look at PyParsing. Quite a nice little lex parser combination