This question already has answers here:
Saving plots (AxesSubPlot) generated from python pandas with matplotlib's savefig
(6 answers)
Pandas plotting in Windows terminal
(2 answers)
Pandas plot doesn't show
(4 answers)
Closed 7 months ago.
I am aware that pandas offer the opportunity to visualize data with plots. Most of the examples I can find and even pandas docu itself use Jupyter Notebook examples.
This code doesn't work in a row python shell.
#!/usr/bin/env python3
import pandas as pd
df = pd.DataFrame({'A': range(100)})
obj = df.hist(column='A')
# array([[<AxesSubplot:title={'center':'A'}>]], dtype=object)
How can I "show" that?
This scripts runs not in an IDE. It runs in a Python 3.9.10 shell interpreter in Windows "Dos-Box" on Windows 10.
Installing jupyter or transfering the data to an external service is not an option in my case.
Demonstrating a solution building on code provided by OP:
Save this as a script named save_test.py in your working directory:
import pandas as pd
df = pd.DataFrame({'A': range(100)})
the_plot_array = df.hist(column='A')
fig = the_plot_array [0][0].get_figure()
fig.savefig("output.png")
Run that script on command line using python save_test.py.
You should see it create a file called output.png in your working directory. Open the generated image with your favorite image file viewer on your machine. If you are doing this remote, download the image file and view on your local machine.
You should also be able to run those lines in succession in a interpreter if the OP prefers.
Explanation:
Solution provided based on the fact Pandas plotting uses matplotlib as the default plotting backend (which can be changed), so you can use Matplotlib's ability to save generated plots as images, combined with Wael Ben Zid El Guebsi's answer to 'Saving plots (AxesSubPlot) generated from python pandas with matplotlib's savefig' and using type() to drill down to see that pandas histogram is returned as an numpy array of arrays. (The first item in the inner array is an matplotlib.axes._subplots.AxesSubplot object, that the_plot_array [0][0] gets. The get_figure() method gets the plot from that matplotlib.axes._subplots.AxesSubplot object.)
Try something like this
df = pd.DataFrame({'A': list(range(100))})
df.plot(kind='line')
Related
I would like to create a dynamic line chart in Voila. How can I manipulate the below code to show a standard line graph where the x axis equals column "a" and the y axis equals column "b"? Potentially the user can then dynamically update the output to make the y axis equal to column "c" by drag and drop etc.
from pivottablejs import pivot_ui
import pandas as pd
import IPython
df = pd.DataFrame(("a": [1,2,3], "b": [30,45,60],"c": [100,222,3444]))
display.display(df)
pivot_ui(df,outfile_path='pivottablejs.html',
rendererName="Line Chart",
cols= ["b","c"]
rows= ["a"],
aggregatorName="Sum"
)
display.display(IPython.display.HTML('pivottablejs.html"))
Thank you.
This should be a comment but I cannot post code with comments easily.
Please always test your code, preferably in a new notebook so it is fresh kerenl, before you post it.
Your dataframe assignment won't work. Should it be something like below?
That display code won't work in Jupyter notebook classic or JupyterLab presently.
Try something like this for assignment and display:
import pandas as pd
df = pd.DataFrame({"a": [1,2,3], "b": [30,45,60],"c": [100,222,3444]})
display(df)
That works in the classic notebook, JupyterLab, and Voila to make & display the dataframe.
Related to this is that is advisable to develop for Voila in JupyterLab. JupyterLab's rendering machinery is more modern and so closer to what Voila uses.
You can easily test renderings in launches from the Voila binder example page. Go there and click 'launch binder for the appropriate rendering. From JupyterLab you can select the Voila icon from the toolbar just above an open notebook and get the Voila rendering on the side.
[m ][1]
QUESTION #1) I am new to python and coding in general. I want to take my data from a CSV which has a column labeled "U.S. OSHA Recordable?". In that column every answer is either "yes" or "no". I want to display a plot.bar that shows "23 yes's" and "7 No's". Essentially adding up the total of "yes's" and "no's in the column, then displaying the total in 1 clean bar graphs. It will display 2 bars with the total number on top of both bars.... The problem is, the bar graph has a single line on the X axis right now and each line says "no, yes, no, yes, yes, no" about 27 individual times. I want the users to easily see 1 bar graph showing only 2 bars with the total on top like this image.
This is my code, I am not sure what i would need to sum up the Yes and No in the column.
import pandas as pd # powerful data visualization library
import numpy as np
import matplotlib.pyplot as plt # allows us to plot things
import csv # allows us to import and use CSV commands which are simple but effective
data = pd.read_csv(r'C:\Users\rmond\Downloads\PS_csvFile.csv', encoding="ISO-8859-1", skiprows=6) #skiprows allows you to skip the comments on top... & ecoding allows pandas to work on this CSV
data.head() # this will give the first row that you want it to read the header
data.plot.bar(x='U.S. OSHA Recordable?') #creates a plot in pandas
plt.show() # shows the plot to the user
df['Val'].value_counts().plot(kind='bar')
Here Val is the name of the column that contains 'Yes' & 'No'
import pandas as pd # powerful data visualization library
import numpy as np
import matplotlib.pyplot as plt # allows us to plot things
import csv # allows us to import and use CSV commands which are simple but effective
import seaborn as sns # it counts everything for you and outputs it exactly like I want
# This website saved my life https://www.pythonforengineers.com/introduction-to-pandas/
# use this to check the available styles: plt.style.available
data = pd.read_csv(r'C:\Users\rmond\Downloads\PS_csvFile.csv', encoding="ISO-8859-1", skiprows=6) #skiprows allows you to skip the comments on top... & ecoding allows pandas to work on this CSV
sns.set(style="whitegrid")
ax = sns.countplot(x='U.S. OSHA Recordable?', data=data)
plt.show() # shows the plot to the user
So interestingly enough I found out about "seaborn" I pip installed it and gave it a shot. It is supposed to pull data from a URL, but after viewing a few other pages on stack overflow I found a great suggestion. Anyways, this works great and it does everything for me. I am so happy with this solution. Now onto the next problem lol. I hope this helps someone else in the future.
My graph looks exactly like the one posted by SH-SF btw. Works great
I am using Python and R code with jupyter notebook at the same time. Specifically, I want to use pandas to deal with the data, pass the DataFrame object to R kernal, and then use ggplot2 to visualize it.
However, as long as I pass the pandas DataFrame object to the R kernal, and use ggplot() to make plots,the jupyter notebook will always give a warning as following:
C:\Study\Anaconda3-5.2.0\lib\site-packages\rpy2-2.9.4-py3.6-win-amd64.egg\rpy2\robjects\pandas2ri.py:191: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), ...) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order.
res = PandasDataFrame.from_items(items)
My code is very simple, showing as the following:
%load_ext rpy2.ipython
%R library(ggplot2)
# data_train is a pandas DataFrame object
%%R -i data_train
ggplot(data = data_train,aes(x = factor(Survived))) + geom_bar(fill = "#539bf3")
You could do it directly in python using python ggplot library
Not exactly what you are asking but in case you overlook it
I have the following code in PyCharm
import pandas as pd
import numpy as np
import matplotlib as plt
df = pd.read_csv("c:/temp/datafile.txt", sep='\t')
df.head(10)
I get the following output:
Process finished with exit code 0
I am supposed to get the first ten rows of my datafile, but these do not appear in PyCharm.
I checked the Project interpreter and all settings seem to be alright there. The right packages are installed (numpy, pandas, matplotlib) under the right Python version.
What am I doing wrong? Thanks.
PyCharm is not Python Shell which automatically prints all results.
In PyCharm you have to use print() to display anything.
print(df.head(10))
The same is when you run script in other IDE or editor or directly python script.py
For printing all data
print(df)
By Default it will print top 5 records for head.
print(df.head())
If you need 10 rows then you can write this way
print(df.head(10))
I did File-Invalidate Caches/Restart Option Invalidate and after that I was able to get the head:
This question already has answers here:
Read .mat files in Python
(15 answers)
Closed 7 years ago.
I'm translating a code from Matlab into Python, and there inevitably are some bugs. I'm going through the code comparing variables to ensure the methods are equivalent.
Is there a way to import Matlab workspace variables into Spyder (or the other way around) so I can do a boolean truth comparison for each variable?
I saved the Matlab workspace as .mat file at 'File Location'
import h5py
import numpy as np
f = h5py.File('File Location')
matlab_arr=f['array name']
matlab_arr=np.array(matlab_arr,dtype='f8')
Could then do a comparison with:
(matlab_arr==python_arr).all()
or
np.isclose(matlab_arr, python_arr).all()