When I try to run this it says file not found. Is there any misatkes I've made?
from wordcloud import WordCloud
from wordcloud import STOPWORDS
import sys
import os
import matplotlib.pyplot as plt
os.chdir(sys.path[0])
text = open('pokemon.txt', mode='r', encoding='utf-8').read()
stop = STOPWORDS
print(stop)
Since your file is in the same folder as the Python program, use ./ before your pokemon.txt like this:
text = open('./pokemon.txt', mode='r', encoding='utf-8').read()
Related
I'm trying to write a program on py3. I have saved 2 raw texts in the same directory as "programm.py" but the program can't find the texts.
I'm using emacs, and I wrote:
from __future__ import division
import nltk, sys, matplotlib, numpy, re, pprint, codecs
from os import path
text1 = "/home/giovanni/Scrivania/Giovanni/programmi/Esame/Milton.txt"
text2 = "/home/giovanni/Scrivania/Giovanni/programmi/Esame/Sksp.txt"
from nltk import ngrams
s_tokenizer = nltk.data.load("tokenizers/punkt/english.pickle")
w_tokenizer = nltk.word_tokenize("text")
print(text1)
but when I run it in py3 it doesn't print text1 (I used it to see if it works)
>>> import programma1
>>>
Instead, if I ask to print in py3 it can't find the file
>>> import programma1
>>> text1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'text1' is not defined
What can I do?
There's a few independent issues going on here. As #Yash Kanojia correctly pointed out, to get the contents of the files you need to read them, rather than just have their address.
The reason you can't access text1 is that it isn't a global variable. To access it, you need to use programma1.text1.
I've also moved all the import statements to the top of programma1.py as it's seen as good practice :)
Full code:
programma1.py:
from __future__ import division
import nltk, sys, matplotlib, numpy, re, pprint, codecs
from nltk import ngrams
from os import path
with open("/home/giovanni/Scrivania/Giovanni/programmi/Esame/Milton.txt") as file1:
text1 = file1.read()
with open("/home/giovanni/Scrivania/Giovanni/programmi/Esame/Sksp.txt") as file2:
text2 = file2.read()
s_tokenizer = nltk.data.load("tokenizers/punkt/english.pickle")
w_tokenizer = nltk.word_tokenize("text")
#print(text1)
main.py:
import programma1
print(programma1.text1)
EDIT:
I presume you wanted to load the contents of the files into the tokenizer. If you do, replace this line:
w_tokenizer = nltk.word_tokenize("text")
with this line
w_tokenizer = nltk.word_tokenize(text1 + "\n" + text2)
Hope this helps.
with open('/home/giovanni/Scrivania/Giovanni/programmi/Esame/Milton.txt') as f:
data = f.read()
print(data)
I have a code to import a txt file and get tokenized words using NLTK library (just like it is done in https://www.datacamp.com/community/tutorials/text-analytics-beginners-nltk). I did almost everything I needed easily, however I'm struggling to build a word cloud with the words I have now and I don't have any clue even after hours of search on the web.
This is my code so far:
# Carrega bibliotecas
!pip install nltk
import nltk
from nltk.tokenize import sent_tokenize
nltk.download('punkt')
from nltk.tokenize import word_tokenize
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
# Import file
f = open('PNAD2002.txt','r')
pnad2002 = ""
while 1:
line = f.readline()
if not line:break
pnad2002 += line
f.close()
tokenized_word=word_tokenize(pnad2002)
tokenized_word_2 = [w.lower() for w in tokenized_word]
I wanted to use the following code (from https://github.com/amueller/word_cloud/blob/master/examples/simple.py):
# Read the whole text.
text = open(path.join(d, 'constitution.txt')).read()
# Generate a word cloud image
wordcloud = WordCloud().generate(text)
# Display the generated image:
# the matplotlib way:
import matplotlib.pyplot as plt
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
# lower max_font_size
wordcloud = WordCloud(max_font_size=40).generate(text)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
But I don't know how to use my tokenized words with this.
You need to instanciate a WordCloud object then call generate_from_text:
wc = WordCloud()
img = wc.generate_from_text(' '.join(tokenized_word_2))
img.to_file('worcloud.jpeg') # example of something you can do with the img
There's a bunch of customization you can pass to WordCloud, you can find examples online such as this: https://www.datacamp.com/community/tutorials/wordcloud-python
I get the error:
UsageError: Line magic function `%cd..` not found.
when running my python code that i usually run from Jupyter Notebook through a shell command.
I use %cd and %ls all the time in Jupiter notebooks and do not get why i can not run it from shell.
I both tried:
python test.py
and
ipython test.py
this is the relevant part of my code:
import csv
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
import pandas as pd
import sys
import os
import IPython
from scipy.misc import imread
import matplotlib.cbook as cbook
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
def main():
script = sys.argv[0]
map_name = sys.argv[1]
callPenalty()
def callPenalty():
%cd standalone-penalty
os.system("octave-cli penalty.m map_bit.bmp 50 1 1 150 150")
%cd..
main()
Does anyone know how to solve that?
I write a code to find the POS for Arabic words in my python shell 2.7 and the output was not correct , i find this solution on stackoverflow :
Unknown symbol in nltk pos tagging for Arabic
and i download all the files needed (stanford-postagger-full-2018-02-27) this file used in the code in the problem above .
this code from above problem and i write it in my shell:
# -*- coding: cp1256 -*-
from nltk.tag import pos_tag
from nltk.tag.stanford import POSTagger
from nltk.data import load
from nltk.tokenize import word_tokenize
_POS_TAGGER = 'taggers/maxent_treebank_pos_tagger/english.pickle'
def pos_tag(tokens):
tagger = load(_POS_TAGGER)
return tagger.tag(tokens)
path_to_model= 'D:\StanfordParser\stanford-postagger-full-2018-02-
27\models/arabic.tagger'
path_to_jar = 'D:\StanfordParser\stanford-postagger-full-2018-02-
27/stanford-postagger-3.9.1.jar'
artagger = POSTagger(path_to_model, path_to_jar, encoding='utf8')
artagger._SEPARATOR = '/'
tagged_sent = artagger.tag(u"أنا تسلق شجرة")
print(tagged_sent)
and the output was :
Traceback (most recent call last):
File "C:/Python27/Lib/mo.py", line 4, in <module>
from nltk.tag.stanford import POSTagger
ImportError: cannot import name POSTagger
How can I solve this error ?
This script works without errors on my PC, but the tagger results do not look very good!!!
import nltk
from nltk import *
from nltk.tag.stanford import StanfordTagger
import os
java_path = "Put your local path in here/Java/javapath/java.exe"
os.environ['JAVAHOME'] = java_path
path_to_model= ('Put your local path in here/stanford-postagger-full-2017-06-09/models/arabic.tagger')
path_to_jar = ('Put your local path in here/stanford-postagger-full-2017-06-09/stanford-postagger.jar')
artagger = StanfordPOSTagger(path_to_model, path_to_jar, encoding='utf8')
artagger._SEPARATOR = "/"
tagged_sent = artagger.tag("أنا أتسلق شجرة".split())
print(tagged_sent)
The results:
[('أنا', 'VBD'), ('أتسلق', 'NN'), ('شجرة', 'NN')]
Give it a try and see :-)
All my files are in a same directory
I'm fresh in python and I'm trying to code functions in a Preprocessing file like this:
#Preprocessing file
from dateutil import parser
def dropOutcomeSubtype(DataFrame):
DataFrame.drop('OutcomeSubtype',axis=1,inplace='True')
def convertTimestampToTime(Serie):
for i in range(0,len(Serie)):
parser.parse(Serie[i]).time()
And then I'm trying to use it in a Exporting file like this:
#Import external librairies
import pandas as pd
import numpy as np
import re
#import our librairy
from Preprocessing import convertTimestampToTime, dropOutcomeSubtype
#Reading
Datas = pd.read_csv("../Csv/train.csv", sep=",", na_values=['NaN'])
dropOutcomeSubtype(Datas)
convertTimestampToTime(Datas.DateTime)
And when i try to run the code in my OSX shell with this config:
Python 3.5.2 |Anaconda 4.2.0 (x86_64)| IPython 5.1.0
I have get this error: cannot import name 'convertTimestampToTime'
and if change my import statement like this:
from Preprocessing import *
I get this error: name 'convertTimestampToTime' is not defined
Can you explain me why please ?
Thank you in advance
In this case you can add mod path to sys.path. if both in same dir add this code at first of main code
import os
import sys
here = os.path.abspath(os.path.dirname(__file__))
sys.path.append(here)