Arabic letters aren't showing correctly in the plot - python

I used all the correct encoding and the software does work but the letters in Arabic aren't connected (most Arabic letters are connected when written next to each other)
This is how the plot shows the image
I used the python Pandas module using the bar chart function

The only way I could get around this is to take all the Arabic words in the pandas column that I need to plot and reshape it with arabic_reshaper, append them to a list and have this list as my x axis:
# reshaping the arabic words to show correctly on matplotlib
import arabic_reshaper
import matplotlib.pyplot as plt
from bidi.algorithm import get_display
x = [ ]
for item in df.column_name.values:
x.append(get_display(arabic_reshaper.reshape(item)))
Of course you need to install the arabic_reshaper and the bidi.algorithm packages first

Related

How to render math symbols as text in SVG/EPS/PDF images?

When creating graphs using, for instance, Python. It is possible to save these figures as vector graphics (SVG, EPS, PDF) and the text is rendered separately. This makes it possible to select or search the text when shown in a pdf file. However, I've tried to render a simple graph using math symbols in addition to text (in latex). The math symbol gets encoded as part of the image, rather than as text.
Here is a minimum reproducible example.
import numpy as np
import matplotlib.pyplot as plt
x_list = np.linspace(-10,10,num=128)
y = list(map(lambda x: (x**2 + x + 1), x_list))
plt.plot(y, label="$\\Psi_{example}$")
plt.legend()
plt.xticks(np.linspace(0, 128, num=8),
map(round, np.linspace(-10, 10, num=8), [0] * 8))
plt.savefig("./example.pdf")
Which produces the following image.
When saving this image as vector graphics, all the numbers as well as the 'example' word in the legend become selectable/searchable (i.e. rendered as text). However, the Ψ (Psi) character is not selectable/searchable.
Is there any way to make math symbols render as text in vector graphics?
I have been able to get it to work in the way I think you want by first installing a LaTeX distribution (I used MikTex, from here) and then setting the matpotlib option to use LaTeX to render your symbols and text.
Note that after installing MikTex, I had to open a new instance of my command prompt or code editor to make sure it was aware of the change to my PATH and where the LaTex is installed.
I added the import and mpl.rcParams line to your example:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['text.usetex'] = True
x_list = np.linspace(-10, 10, num=128)
y = list(map(lambda x: (x**2 + x + 1), x_list))
plt.plot(y, label="$\\Psi_{example}$")
plt.legend()
plt.xticks(np.linspace(0, 128, num=8),
map(round, np.linspace(-10, 10, num=8), [0] * 8))
plt.savefig("./example.pdf")
It's two different matters. Characters are represented by codes. In this case, you are not able to select some characters because the software you are using to display the rendered results does not have that Unicode defined in its fonts library. So it's treating that character as an object or an empty box(commonly called “tofu”). But the render engine that is turning your python code(or TeX file) into a PDF/SVG does understand that Unicode and that's why you can see that particular character. So much for understanding the source of the issue.
Solution: You may use another IDE/browser if you are using that platform to see the results. Chrome usually supports most Unicodes. Except for those that are defined very recently.
Moreover, Ψ (Psi) is a Greek letter. Check if your Operating System does have Greek letters installed in its fonts library. If it doesn't, go to The Unicode Consortium website and search "Display Problems" it will come up with a page explaining how to install a font depending on your OS or browser.

stylecloud does not show underline words

as i know and i have read definition of the wordcloud is following :
Wordcloud is a popular technique that helps us identify the keywords in a text.
In a wordcloud, more frequent words have a larger and bolder font, while less frequent words have smaller or thinner fonts.
In Python, you can make simple wordclouds with the wordcloud library and nice-looking wordclouds with the stylecloudlibrary.
i have following code in order to plot those underlaine and keywords from the text :
import numpy as np
import matplotlib.pyplot as plt
import stylecloud
stylecloud.gen_stylecloud(file_path='SJ-Speech.txt',
icon_name= "fas fa-apple-alt")
plt.show()
expected output should be this :
but result is nothing :
C:\Users\User\PycharmProjects\AI_Project\venv\Scripts\python.exe C:/Users/User/PycharmProjects/AI_Project/Word_Clous_Example.py
Process finished with exit code 0
did i miss something?please help me

How to display non-English fonts in matplotlib and networkx?

This is a followup question to this question. Since it addresses a more general issue I make it a new question.
I have a network for which the labels of the nodes are in Farsi language (Arabic alphabet). When I try to use networkx to display my network it shows blank squares instead of Arabic letters. Below I copy a good example provided in the answers in here.
from bidi.algorithm import get_display
import matplotlib.pyplot as plt
import arabic_reshaper
import networkx as nx
# Arabic text preprocessing
reshaped_text = arabic_reshaper.reshape(u'زبان فارسی')
artext = get_display(reshaped_text)
# constructing the sample graph
G=nx.Graph()
G.add_edge('a', artext ,weight=0.6)
pos=nx.spring_layout(G)
nx.draw_networkx_nodes(G,pos,node_size=700)
nx.draw_networkx_edges(G,pos,edgelist=G.edges(data=True),width=6)
# Drawing Arabic text
# Just Make sure your version of the font 'Times New Roman' has Arabic in it.
# You can use any Arabic font here.
nx.draw_networkx_labels(G,pos,font_size=20, font_family='Times New Roman')
# showing the graph
plt.axis('off')
plt.show()
which generates the following image:
I tried to install the needed fonts by following command lines in python, but I get the same thing.
>>> import matplotlib.pyplot
>>> matplotlib.rcParams.update({font.family' : 'TraditionalArabic'})
Here is the ERROR message, to be more specific:
/usr/local/anaconda3/lib/python3.5/site-packages/matplotlib/font_manager.py:1288: UserWarning: findfont: Font family ['TraditionalArabic'] not found. Falling back to Bitstream Vera Sans
(prop.get_family(), self.defaultFamily[fontext])
I am also investigating ways to install the needed fonts from ubuntu cli, if possible, and put it in my docker file as it gets installed every time I spin my runs.
Best regards, s.

Create wordcloud from dictionary values

I just wrote a script that extracts all the spoken text in the Dutch Parlement of a few thousand XML files. For every speaker it count the amount of times a speaker said some words.
After doing this I calculated the TF * IDF value of every word for each speaker in the Dutch Parlement. If you are not familiar with this see this link: TF IDF explanation
So now I have a dictionary for each speaker in the Dutch Parlement where the keys are the words he said and the values are the corresponding TF*IDF values:
{u'asielzoekers': 0.0034861170591325486,
u'belastingverlaging': 0.0018551991553514675,
u'buma': 0.0020712555982839408,
u'islam': 0.0029519544163739155,
u'moslims': 0.0027958002747301355,
u'ouderen': 0.0022803123245457566,
u'pechtold': 0.0021525864470786928,
u'president': 0.003281844532743345,
u'rutte': 0.0023488684001475584,
u'samsom': 0.0019304632325980841}
Right now I want to create a wordcloud from these values. I have shortly looked into the wordcloud module written by amueller But for as far as I can see this module is not working with a dictionary but just plain text.
So any help on how to create a wordcloud from a dictionary's values would be highly appreciated.
Thanks in advance!
dictionary= {u'asielzoekers': 0.0034861170591325486,.. u'samsom': 0.0019304632325980841}
from PIL import Image
import matplotlib.pyplot as plt
from wordcloud import WordCloud
wc = WordCloud(background_color="white",width=1000,height=1000, max_words=10,relative_scaling=0.5,normalize_plurals=False).generate_from_frequencies(dictionary)
plt.imshow(wc)
import matplotlib.pyplot as plt
from wordcloud import WordCloud
word_could_dict = {'Git':100, 'GitHub':100, 'push':50, 'pull':10, 'commit':80, 'add':30, 'diff':10,
'mv':5, 'log':8, 'branch':30, 'checkout':25}
wordcloud = WordCloud(width = 1000, height = 500).generate_from_frequencies(word_could_dict)
plt.figure(figsize=(15,8))
plt.imshow(wordcloud)
And we get:
Creating Wordcloud with Dictionaries
05/02/21 | 5th of February 2021 |
Producing a visualised wordcloud image from dictionary name-value pair values with WordCloud's module method of the following: generate(), generate_from_text() and generate_from_frequencies()will not work after I tried so many times figuring it out how to overcome this problem.
Having checked on Stack Overflow if there's any way around to it, I tried replicating the solution from the above answers in my program. It did not resolve the issue I had which was a TypeError exception from creating & displaying a wordcloud image.
After checking the "WordCloud API Documentation" from their official module site, I found out you have to manually use something called "multidict". It's a Python module which acts like a dictionary utilising..
"[a] collection of key-value pairs where key might be occurred more than
once in the container"
- quoted from the Multidict's main PyPi introductory page
For more information on Multidict's module, click here: https://multidict.readthedocs.io/en/stable/
To check out their official GitHub repositiory, click the following: https://github.com/aio-libs/multidict
Extracted from "WordCloud's Gallery of Example" page, here is a snippet of using the multidict module to build a frequency dictionary of values visualised in a wordcloud display:
import multidict as multidict
...
def getFrequencyDictForText(sentence):
# instantiate multidict object
fullTermsDict = multidict.MultiDict()
tmpDict = {}
# making dict for counting frequencies
for text in sentence.split(" "):
...
val = tmpDict.get(text, 0)
tmpDict[text.lower()] = val + 1
for key in tmpDict:
fullTermsDict.add(key, tmpDict[key])
return fullTermsDict
def makeImage(text):
alice_mask = np.array(Image.open("alice_mask.png"))
# instantiate and define wordcloud properties
wc = WordCloud(background_color="white", max_words=1000, mask=alice_mask)
# generate wordcloud
wc.generate_from_frequencies(text)
# display and show "wc"
plt.imshow(wc, interpolation="bilinear")
plt.show()
...
...
Note: This is not the full source. To see the whole code, check out Amueller WordCloud website

How to create 'normal' looking axis labels using latex in matplotlib

I have the following piece of code to create axis labels with German umlauts:
plt.xlabel('Daten')
plt.ylabel(r'$H\ddot{a}ufigkeit$')
which basically works, and prints the a-umlaut correctly, But the font of the x and y labels are now different, as the x label is printed in math mode. Changing the second line to
plt.ylabel(r'$\textrm{H\ddot{a}ufigkeit}$')
should work as far as I know (in order to create a rm like font instead of the math mode font), but gives a python error:
matplotlib.pyparsing.ParseFatalException: Expected end of math '$'
How can I fix this issue in order to have the same font on both axis, but with umlauts possible?
The non-math umlaut is \":
plt.ylabel(r'H\"{a}ufigkeit')
If you need \ddot only put the $ around that:
plt.ylabel(r'H$\ddot{a}$ufigkeit')
As an aside, the \textrm command only works in text mode. The math-mode equivalent is \mathrm:
plt.ylabel(r'$\mathrm{H\ddot{a}ufigkeit}$')
UPDATE
All of the above assume that you have told matplotlib to render with tex. To do this, add the following at the top of your code:
import matplotlib.pyplot as plt
plt.rc('text', usetex=True)

Categories

Resources