I am running the following code in Kaggle. The language set is "Python". Trying to do perform NLP on a paragraph and create visuals using the code below.
import matplotlib
# Force matplotlib to not use any Xwindows backend.
matplotlib.use('Agg')
import matplotlib.pyplot
import matplotlib.pyplot as plt
import pandas as pd
import itertools
from nltk.tokenize import sent_tokenize
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag
from nltk.tokenize import RegexpTokenizer
from nltk import RegexpParser
document= """The BBC has been testing a new service called SoundIndex, which
lists the top 1,000 artists based on discussions crawled from Bebo,
Last.fm, Google Groups, iTunes, MySpace and YouTube. The top five
bands according to SoundIndex right now are Coldplay, Rihanna, The
Ting Tings, Duffy and Mariah Carey , but the index is refreshed
every six hours. SoundIndex also lets users sort by popular tracks,
search by artist, or create customized charts based on music
preferences or filters by age range, sex or location. Results can
also be limited to just one data source (such as Last.fm).
"""
sentences = sent_tokenize(document)
sentences = [word_tokenize(sent) for sent in sentences]
sentences = [pos_tag(sent) for sent in sentences]
sentence = list(itertools.chain(*sentences))
#grammar = "NP: {<DT>?<JJ>*<NN>}"
grammar = """
NOUN_VERB_NOUN: {<DT>?<VB>*<NN.*>+}
GRUND_NOUN: {<VBG.><NN.*>+}
VN:{<VBN><NN>+}
NOUN_AND_ADJ: {<NN>?<JJ>*<NN.*>+}
{<N.*|JJ.*>*<N.*>} # Nouns and Adjectives, terminated with Nouns
NOUN_PHRASE: {<DT>?<JJ>*<NN>}
ADJ_PHRASE: {}
KEYPHRASE: {<DT>?<JJ>*<NN>}
KEYWORDS: {<NN.*>}
VERB_PHRASE: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments
CLAUSE: {<NP><VP>}
"""
cp = RegexpParser(grammar)
result = cp.parse(sentence)
#print(result)
result.draw()
for subtree in result.subtrees():
#if subtree.label() == "NOUN_VERB_NOUN":
#print("NOUN_VERB_NOUN: "+str(subtree.leaves()))
print(str(subtree.label())+" "+str(subtree.leaves()))
result.draw()
return(result)
The error I am getting on execution is:
TclError: no display name and no $DISPLAY environment variable
Could anyone please assist on how can I resolve this issue?
Related
as i know and i have read definition of the wordcloud is following :
Wordcloud is a popular technique that helps us identify the keywords in a text.
In a wordcloud, more frequent words have a larger and bolder font, while less frequent words have smaller or thinner fonts.
In Python, you can make simple wordclouds with the wordcloud library and nice-looking wordclouds with the stylecloudlibrary.
i have following code in order to plot those underlaine and keywords from the text :
import numpy as np
import matplotlib.pyplot as plt
import stylecloud
stylecloud.gen_stylecloud(file_path='SJ-Speech.txt',
icon_name= "fas fa-apple-alt")
plt.show()
expected output should be this :
but result is nothing :
C:\Users\User\PycharmProjects\AI_Project\venv\Scripts\python.exe C:/Users/User/PycharmProjects/AI_Project/Word_Clous_Example.py
Process finished with exit code 0
did i miss something?please help me
I am doing a project in which I need to estimate the age of an individual, given an X-Ray of their hand. I am given a testing set, which contains a large collection of images (in a folder on my computer), all NUMBERED, and I am also given a CSV file that corresponds each image number with 2 pieces of information: the age(in months), as well as whether the individual is male (this is given as "true" or "false." Also, I believe I have successfully imported both of these files into python(the image folder, as well as the CSV file)
I have looked at many TensorFlow tutorials, but I am struggling to figure out how I can associate the image numbers together, as well as train the data set. Any help would be greatly appreciated!!
I have attached blocks of my code, as well as how the data is presented to me, up until this point.
import pandas as pd
import numpy as np
import os
import tensorflow as tf
import cv2
from tensorflow import keras
from tensorflow.keras.layers import Dense, Input, InputLayer, Flatten
from tensorflow.keras.models import Sequential, Model
from matplotlib import pyplot as plt
import matplotlib.image as mpimg
import random
%matplotlib inline
import matplotlib.pyplot as plt
--This simply imports libraries that I use, or anticipate using later on.
plt.figure(figsize=(20,20))
train_images=r'/Users/FOLDER/downloads/Boneage_competition/training_dataset/boneage-training-dataset'
for i in range(5):
file = random.choice(os.listdir(train_images))
image_path= os.path.join(train_images, file)
img=mpimg.imread(image_path)
ax=plt.subplot(1,5,i+1)
ax.title.set_text(file)
plt.imshow(img)
-- This successfully imports the image folder, as well as prints 5 random images to test if the importing worked.
This screenshot provides an example of how the pictures are depicted
IMG_WIDTH=200
IMG_HEIGHT=200
img_folder=r'/Users/FOLDER/downloads/Boneage_competition/training_dataset/'
-- I believe this resizes all the images to the specified dimensions
label_file = '/Users/FOLDER/downloads/train.csv'
train_labels = pd.read_csv (r'/Users/FOLDER/downloads/train.csv')
print (train_labels)
-- This successfully imports the data from the CSV file, and prints it, to make sure it worked.
If you have any ideas on how to connect these two datasets and train the data, I would greatly appreciate it.
Thank you!
The approach is simple create a map between the image_data and the label. After that you can create two lists/np.array and use the same to pass the train and label info to you model. Following code should help in getting the same.
import os
import glob
dic = {}
# assuming you have .png format files else change the same into the glob statement
train_images='/Users/FOLDER/downloads/Boneage_competition/training_dataset/boneage-training-dataset'
for file in glob.glob(train_images+'/*.png'):
b_name = os.path.basename(file).split('.')[0]
dic[b_name] = mpimg.imread(file)
dic_label_match = {}
label_file = '/Users/FOLDER/downloads/train.csv'
train_labels = pd.read_csv (r'/Users/rkrishna/downloads/train.csv')
for i in range(len(train_labels)):
# given your first column is age and image no starts from 1
dic_label_match[i+1] = str(train_labels.iloc[i][0])
# you can use the below line too
# dic_label_match[i+1] = str(train_labels.iloc[i][age])
# now you have dict with keys and values
# create two lists / arrays and you can pass the same to the keram model
train_x = []
label_ = []
for val in dic:
if val in dic and val in dic_label_match:
train_x.append(dic[val])
label_.append(dic_label_match[val])
I am trying to use Gensim packages as written below:
import re, numpy as np, pandas as pd
from pprint import pprint
# Gensim
import gensim, spacy, logging, warnings
import gensim.corpora as corpora
from gensim.utils import lemmatize, simple_preprocess
from gensim.models import CoherenceModel
import matplotlib.pyplot as plt
But i keep getting the error:
ImportError: cannot import name 'lemmatize' from 'gensim.utils' (/Users/xxx/opt/anaconda3/envs/virt_env/lib/python3.9/site-packages/gensim/utils.py)
I am using gensim v4.0.1, Python 3.8, numpy 1.20.0.
Has anyone encountered this kinda problem lately? Thank you
Gensim only ever previously wrapped the lemmatization routines of another library (Pattern) – which was not a particularly modern/maintained option, so was removed from Gensim-4.0.
Users should choose & apply their own lemmatization operations, if any, as a preprocessing step before applying Gensim's algorithms. Some Python libraries offering lemmatization include:
Pattern (Gensim's previously-included option): https://github.com/clips/pattern
NLTK: https://www.nltk.org/api/nltk.stem.html#nltk.stem.wordnet.WordNetLemmatizer
UDPipe: https://ufal.mff.cuni.cz/udpipe
Spacy: https://spacy.io/api/lemmatizer
Stanza: https://stanfordnlp.github.io/stanza/
I used all the correct encoding and the software does work but the letters in Arabic aren't connected (most Arabic letters are connected when written next to each other)
This is how the plot shows the image
I used the python Pandas module using the bar chart function
The only way I could get around this is to take all the Arabic words in the pandas column that I need to plot and reshape it with arabic_reshaper, append them to a list and have this list as my x axis:
# reshaping the arabic words to show correctly on matplotlib
import arabic_reshaper
import matplotlib.pyplot as plt
from bidi.algorithm import get_display
x = [ ]
for item in df.column_name.values:
x.append(get_display(arabic_reshaper.reshape(item)))
Of course you need to install the arabic_reshaper and the bidi.algorithm packages first
I just wrote a script that extracts all the spoken text in the Dutch Parlement of a few thousand XML files. For every speaker it count the amount of times a speaker said some words.
After doing this I calculated the TF * IDF value of every word for each speaker in the Dutch Parlement. If you are not familiar with this see this link: TF IDF explanation
So now I have a dictionary for each speaker in the Dutch Parlement where the keys are the words he said and the values are the corresponding TF*IDF values:
{u'asielzoekers': 0.0034861170591325486,
u'belastingverlaging': 0.0018551991553514675,
u'buma': 0.0020712555982839408,
u'islam': 0.0029519544163739155,
u'moslims': 0.0027958002747301355,
u'ouderen': 0.0022803123245457566,
u'pechtold': 0.0021525864470786928,
u'president': 0.003281844532743345,
u'rutte': 0.0023488684001475584,
u'samsom': 0.0019304632325980841}
Right now I want to create a wordcloud from these values. I have shortly looked into the wordcloud module written by amueller But for as far as I can see this module is not working with a dictionary but just plain text.
So any help on how to create a wordcloud from a dictionary's values would be highly appreciated.
Thanks in advance!
dictionary= {u'asielzoekers': 0.0034861170591325486,.. u'samsom': 0.0019304632325980841}
from PIL import Image
import matplotlib.pyplot as plt
from wordcloud import WordCloud
wc = WordCloud(background_color="white",width=1000,height=1000, max_words=10,relative_scaling=0.5,normalize_plurals=False).generate_from_frequencies(dictionary)
plt.imshow(wc)
import matplotlib.pyplot as plt
from wordcloud import WordCloud
word_could_dict = {'Git':100, 'GitHub':100, 'push':50, 'pull':10, 'commit':80, 'add':30, 'diff':10,
'mv':5, 'log':8, 'branch':30, 'checkout':25}
wordcloud = WordCloud(width = 1000, height = 500).generate_from_frequencies(word_could_dict)
plt.figure(figsize=(15,8))
plt.imshow(wordcloud)
And we get:
Creating Wordcloud with Dictionaries
05/02/21 | 5th of February 2021 |
Producing a visualised wordcloud image from dictionary name-value pair values with WordCloud's module method of the following: generate(), generate_from_text() and generate_from_frequencies()will not work after I tried so many times figuring it out how to overcome this problem.
Having checked on Stack Overflow if there's any way around to it, I tried replicating the solution from the above answers in my program. It did not resolve the issue I had which was a TypeError exception from creating & displaying a wordcloud image.
After checking the "WordCloud API Documentation" from their official module site, I found out you have to manually use something called "multidict". It's a Python module which acts like a dictionary utilising..
"[a] collection of key-value pairs where key might be occurred more than
once in the container"
- quoted from the Multidict's main PyPi introductory page
For more information on Multidict's module, click here: https://multidict.readthedocs.io/en/stable/
To check out their official GitHub repositiory, click the following: https://github.com/aio-libs/multidict
Extracted from "WordCloud's Gallery of Example" page, here is a snippet of using the multidict module to build a frequency dictionary of values visualised in a wordcloud display:
import multidict as multidict
...
def getFrequencyDictForText(sentence):
# instantiate multidict object
fullTermsDict = multidict.MultiDict()
tmpDict = {}
# making dict for counting frequencies
for text in sentence.split(" "):
...
val = tmpDict.get(text, 0)
tmpDict[text.lower()] = val + 1
for key in tmpDict:
fullTermsDict.add(key, tmpDict[key])
return fullTermsDict
def makeImage(text):
alice_mask = np.array(Image.open("alice_mask.png"))
# instantiate and define wordcloud properties
wc = WordCloud(background_color="white", max_words=1000, mask=alice_mask)
# generate wordcloud
wc.generate_from_frequencies(text)
# display and show "wc"
plt.imshow(wc, interpolation="bilinear")
plt.show()
...
...
Note: This is not the full source. To see the whole code, check out Amueller WordCloud website