Unable to import a column from csv into NLTK using python - python

I am unable to tokenize data from csvfile into nltk
this is my code
import nltk
import csv
import numpy
from nltk import sent_tokenize
from nltk import word_tokenize
from nltk import pos_tag
reader = csv.reader(open('/Users/yoshithKotla/Desktop/dingdang/tweets.csv', 'rU'), delimiter= ",",quotechar='|')
tokenData = nltk.word_tokenize(reader)

You can use Pandas library.
import pandas as pd
from nltk import word_tokenize
data=pd.read_csv('/Users/yoshithKotla/Desktop/dingdang/tweets.csv')
#Considering tweets to be present in 'text' column.
Texts=list(data['text'].values)
tokenData = [nltk.word_tokenize(tweet) for tweet in Texts]
tokenData is a list of word_tokenized lists.

Related

File not found in python (WORDCLOUD)

When I try to run this it says file not found. Is there any misatkes I've made?
from wordcloud import WordCloud
from wordcloud import STOPWORDS
import sys
import os
import matplotlib.pyplot as plt
os.chdir(sys.path[0])
text = open('pokemon.txt', mode='r', encoding='utf-8').read()
stop = STOPWORDS
print(stop)
Since your file is in the same folder as the Python program, use ./ before your pokemon.txt like this:
text = open('./pokemon.txt', mode='r', encoding='utf-8').read()

Pandas code running but not doing anything

Below is the code:
import pandas as pd
from nltk.sentiment.vader import SentimentIntensityAnalyzer
dataset=pd.read_excel('file_path')
sia=SentimentIntensityAnalyzer()
dataset['polarity scores']=dataset['column_title'].apply(lambda x: sia.polarity_scores(str(x))['compound'])
print("done")
I would like it to take the excel file named/located file_path and give me polarity scores for the text in the column entitled column title but I'm not sure what I'm doing wrong.The code runs without any errors but it does not edit the excel file at all
You forgot to save your file
use dataset.to_excel('output_file_path.xlsx') to save your file
import pandas as pd
from nltk.sentiment.vader import SentimentIntensityAnalyzer
dataset=pd.read_excel('file_path')
sia=SentimentIntensityAnalyzer()
dataset['polarity scores']=dataset['column_title'].apply(lambda x: sia.polarity_scores(str(x))['compound'])
dataset.to_excel('output_file_path.xlsx', index = False) # save file
print("done")

words stemming and save output to text file

i have this code its stemmed words from text file and save the output to another text file.
the problem is its just do stemming to the first word and ignore the others.
any help ?
import nltk
from nltk.stem import PorterStemmer
from nltk.stem import LancasterStemmer
from nltk.stem.porter import PorterStemmer
stemmer = PorterStemmer()
with open(r'file path', 'r') as fp:
tokens = fp.readlines()
for t in tokens:
s = stemmer.stem(t.strip())
print(s, file=open("output.txt", "a"))

tweet = textblob(tweet) TypeError: 'module' object is not callable

tweet = textblob(tweet)
TypeError: 'module' object is not callable
I have this problem while trying to run a sentiment analysis script. I have installed textblob with the following commands:
$ pip install -U textblob
$ python -m textblob.download_corpora
the code is the following:
import json
import csv
from textblob import TextBlob
#set the input and outputing file
input_file= "tweets.json"
output_file= "results.csv"
#store all json data
tweets_novartis = []
with open (input_file) as input_novartis:
for line in input_novartis:
tweets_novartis.append(json.loads(line))
#open output file to store the results
with open(output_file, "w") as output_novartis:
writer = csv.writer(output_novartis)
#iterate through all the tweets
for tweets_novartis in tweets_novartis:
tweet = tweets_novartis["full_text"]
#TextBlob to calculate sentiment
tweet = Textblob(tweet)
tweet = tweet.replace("\n" , " ")
tweet = tweet.replace("\r" , " ")
sentiment = [[tweet.sentiment.polarity]]
writer.writerows(sentiment)
textblob as a package is not callable, hence we import the TextBlob object from textblob package.
from textblob import TextBlob
b = TextBlob('i hate to go to school')
print(b.sentiment)
You need to import TextBlob from textblob module
from textblob import TextBlob
tweet = TextBlob(tweet)
From the Docs:
>>> from textblob import TextBlob
>>> wiki = TextBlob("Python is a high-level, general-purpose programming language.")
More info
Python is case sensitive. Use this:
from textblob import TextBlob
tweet = TextBlob(tweet)
python != vba

How to tokenize natural English text in an input file in python?

I want to tokenize input file in python please suggest me i am new user of python .
I read the some thng about the regular expression but still some confusion so please suggest any link or code overview for the same.
Try something like this:
import nltk
file_content = open("myfile.txt").read()
tokens = nltk.word_tokenize(file_content)
print tokens
The NLTK tutorial is also full of easy to follow examples: https://www.nltk.org/book/ch03.html
Using NLTK
If your file is small:
Open the file with the context manager with open(...) as x,
then do a .read() and tokenize it with word_tokenize()
[code]:
from nltk.tokenize import word_tokenize
with open ('myfile.txt') as fin:
tokens = word_tokenize(fin.read())
If your file is larger:
Open the file with the context manager with open(...) as x,
read the file line by line with a for-loop
tokenize the line with word_tokenize()
output to your desired format (with the write flag set)
[code]:
from __future__ import print_function
from nltk.tokenize import word_tokenize
with open ('myfile.txt') as fin, open('tokens.txt','w') as fout:
for line in fin:
tokens = word_tokenize(line)
print(' '.join(tokens), end='\n', file=fout)
Using SpaCy
from __future__ import print_function
from spacy.tokenizer import Tokenizer
from spacy.lang.en import English
nlp = English()
tokenizer = Tokenizer(nlp.vocab)
with open ('myfile.txt') as fin, open('tokens.txt') as fout:
for line in fin:
tokens = tokenizer.tokenize(line)
print(' '.join(tokens), end='\n', file=fout)

Categories

Resources