Interactive visualization - select which csv to visualize - python

I'm writing an interactive visualization code using Python.
What i would like to do is to create an interactive visualization which allows the user to select a file from a dropdown menu (or something like that) and then plot a barplot of the selected data.
My data folder has the following structure:
+-- it_features
| +-- it_2017-01-20--2017-01-27.csv
| +-- it_2017-01-27--2017-02-03.csv
| +-- it_2017-02-03--2017-02-10.csv
and so on (there are many more files, I'm just reporting few of them for simplicity).
So far I'm able to access and retrieve all the data contained in the folder:
import os
import pandas as pd
path = os.getcwd()
file_folder = os.path.join(path,'it_features')
for csv_file in os.listdir(file_folder):
print(csv_file)
file = os.path.join(file_folder,csv_file)
df = pd.read_csv(file)
#following code....
What I would like to do is create an insteractive visualization which allows the user to select the file name (for example it_2017-02-03--2017-02-10.csv) and plot the data of that file.
I'm able to select "by hand" the file I want and plot its data by inserting its filename in a variable and then retrieving the data, but I would like not to insert it via code and allow the final user to browse and select one of the files using a dropdown menu or something similar.
My simple code:
import os
import pandas as pd
path = os.getcwd()
file_folder = os.path.join(path,'it_features')
file = os.path.join(file_folder,'it_2020-02-07--2020-02-14.csv') # Here I insert my filename
df=pd.read_csv(file)
ax=df.value_counts(subset=['Artist']).head(10).plot(y='number of songs',kind='bar', figsize=(15, 7), title="7-14 February 2020")
ax.set_xlabel("Artist")
ax.set_ylabel("Number of Songs Top 200")
Which generates the following plot:
As I already said, I would like to introduce a somewhat drodown menu that allows the user to select the csv data he wants to plot using an interactive plot.
I saw that it's possible to create dropdown menus with Plotly, but in the various examples (https://plotly.com/python/dropdowns/) it doesn't seem to select and then load the data.
I also saw this code (Kaggle code) which seems to do what I wanted to do: you can select the region and plot the data from that region.
The main problem is that he just creates a big unique dataframe with US states, and then creates a trace for each one of them.
What i would like to do (if possible) is to select the file name from the dropdown, load the csv and then plot its data, without creating a single giant dataframe with all my files in it.
Is it possible?
EDIT: The solution proposed by gherka works perfectly, but I would like to have a solution inside Plotly using its dropdown menu.

Since you're working in Jupyter Notebook, you have a number of different options.
Some visualisation libraries will have built-in widgets that you can use, however they would often require you to run a server or provide a javascript callback. For a library-agnostic approach, you can use ipywidgets. This library is specifically for creating widgets to be used in Jupyter Notebooks. The documentation is here.
To create a simple dropdown with a static bar plot underneath, you would need three widgets - Label for dropdown description, Dropdown and Output. VBox is for laying them out.
from ipywidgets import VBox, Label, Dropdown, Output
desc = Label("Pick a .csv to plot:")
dropdown = Dropdown(
options=['None', 'csv1', 'csv2', 'csv3'],
value='None',
disabled=False)
output = Output()
dropdown.observe(generate_plot, names="value")
VBox([desc, dropdown, output])
The key element is the generate_plot function. It must have a single parameter that you use to decide what effect the widget action has on your plot. When you interact with the dropdown, the generate_plot function will be called and passed a dictionary with "new" value, "old" value and a few other things.
Here's a function to generate a basic seaborn bar chart with an adjustable data source. Notice I had to include an explicit plt.show() - plots won't render otherwise.
def generate_plot(change):
with output:
output.clear_output() # reset the view
if change["new"] != "None":
data = pd.read_csv(...) # your custom code based on dropdown selection
sns.catplot(x="Letters", y="Numbers", kind="bar", data=data)
fig = plt.figure()
plt.show(fig)
If you have many large .csv files, one other thing is you might want to do is implement a caching system so that you keep the last few user selections in memory and avoid re-reading them on each selection.
For a more in-depth look at how to add interactivity to matplotlib plots using ipywidgets I found this tutorial quite useful.

tkinter is a super common UI framework for python, and is part of the standard library. Based on answers in a similar question, you can use this:
from tkinter.filedialog import askopenfilename
filename = askopenfilename()
which pops up a standard file explorer window.

Related

How to make Kepler.gl plots consistent

I'm using kepler.gl to create some geographical plots.
I want to create some setting for the plot and than set these as default for when I run the cell containing that plot.
These are the steps:
I copied the conf from Kepler.gl website from Share>Share Map> Map Config.
I pasted the copied text to my python notebook cell in the variable
conf
conf = {. . .} # copied from the website after setting the correct visualization
I replaced all true with True, false with False etc.
In the notebook I created a map and set the config
from keplergl import KeplerGl
import json
map1 = KeplerGl()
map1.add_data(data=gpd.read_file('my.geojson'), name='name1')
map1.config = conf
I modified the label property in conf so that it would be equal to the name property inside the add_data function
Finally when I show the plot using
map1
It shows a basic plot of my geojson without any of the configurations in conf.
NB.
The geojson file I uploaded in Kepler.gl is the same file I used in KeplerGL() python function
I read online that it can be due to the IDs of the datasets but I don't understand how to make those IDs the same since I'm using the same dataset.
Here's the documentation, personally I didn't manage to find the answer to my question in there but maybe there is and I didn't understand it.
You need to pass the value of the name that you define in the
map1.add_data()to the dataId property (instead of the label) of your config.
So in your exapmpe:
"dataId": ["name1"]
šŸ’”Important: Depending on your style settings the field "dataId" might exist more than once. You need to replace the value of all instances.

How to sum up yes and no into a total quantity, using matplotlib, pandas, python from a CSV import to plot a graph

[m ][1]
QUESTION #1) I am new to python and coding in general. I want to take my data from a CSV which has a column labeled "U.S. OSHA Recordable?". In that column every answer is either "yes" or "no". I want to display a plot.bar that shows "23 yes's" and "7 No's". Essentially adding up the total of "yes's" and "no's in the column, then displaying the total in 1 clean bar graphs. It will display 2 bars with the total number on top of both bars.... The problem is, the bar graph has a single line on the X axis right now and each line says "no, yes, no, yes, yes, no" about 27 individual times. I want the users to easily see 1 bar graph showing only 2 bars with the total on top like this image.
This is my code, I am not sure what i would need to sum up the Yes and No in the column.
import pandas as pd # powerful data visualization library
import numpy as np
import matplotlib.pyplot as plt # allows us to plot things
import csv # allows us to import and use CSV commands which are simple but effective
data = pd.read_csv(r'C:\Users\rmond\Downloads\PS_csvFile.csv', encoding="ISO-8859-1", skiprows=6) #skiprows allows you to skip the comments on top... & ecoding allows pandas to work on this CSV
data.head() # this will give the first row that you want it to read the header
data.plot.bar(x='U.S. OSHA Recordable?') #creates a plot in pandas
plt.show() # shows the plot to the user
df['Val'].value_counts().plot(kind='bar')
Here Val is the name of the column that contains 'Yes' & 'No'
import pandas as pd # powerful data visualization library
import numpy as np
import matplotlib.pyplot as plt # allows us to plot things
import csv # allows us to import and use CSV commands which are simple but effective
import seaborn as sns # it counts everything for you and outputs it exactly like I want
# This website saved my life https://www.pythonforengineers.com/introduction-to-pandas/
# use this to check the available styles: plt.style.available
data = pd.read_csv(r'C:\Users\rmond\Downloads\PS_csvFile.csv', encoding="ISO-8859-1", skiprows=6) #skiprows allows you to skip the comments on top... & ecoding allows pandas to work on this CSV
sns.set(style="whitegrid")
ax = sns.countplot(x='U.S. OSHA Recordable?', data=data)
plt.show() # shows the plot to the user
So interestingly enough I found out about "seaborn" I pip installed it and gave it a shot. It is supposed to pull data from a URL, but after viewing a few other pages on stack overflow I found a great suggestion. Anyways, this works great and it does everything for me. I am so happy with this solution. Now onto the next problem lol. I hope this helps someone else in the future.
My graph looks exactly like the one posted by SH-SF btw. Works great

Edit python script used as Data entry in Power BI

I have a python script and used it to create a dataframe in Power BI.
Now I want to edit that dataframe in Power BI but donĀ“t enter from scratch as new data because I want to keep all the charts inside my Power BI model.
For example in my old dataframe i specified some dates inside my script so the information was limited to those dates. Now i want to change the dates to new ones but dont want to lose all the model.
df = df
You can edit the python scripts doing the following steps:
Open Query Editor
At 'Applied steps', the first one, source, contains a small gear symbol just on the right side, click on it.
You can change the script direct into Power Query.
I hope you're not dong this in a PowerBI Python Visual. If you're using Python under the Transform tab in the Power Query Editor, the key to your problem lies not in Python itself, but rather in the reference function available to you if you right-click the table under queries in the Power Query Editor:
Try this:
1: Save the following sample data in a csv as C:\pbidata\src.csv file and load it into PowerBI using Get Data > Text/Csv
A,B,C
1,1*0,100
2,20,200
3,30,300
2: Display it as a table:
3: Open the Power Query Editor through Edit Queries
4: Add some Python
Here you can insert a Python snippet after the Changed type step under Applied steps with Transform > Run Python Script. Inserting the following example code:
# 'dataset' holds the input data for this script
import pandas as pd
df=dataset.copy(deep=True)
df['D']=df['C']*2
... will give you this:
5: And let's say that you're happy with this for now and that you'd like to make a plot out of it back on the Power BI Desktop. I'm using a clustered bar chart to get this:
6: Now, like you're saying, if you'd like to have df['D']=df['C']/4 instead, but retain the same dataset, Python script and figure Plot 1, Py script 1, go back to the Power Query Editor and use the functionality that I mentioned in the beginning:
7: And add another Python snippet:
# 'dataset' holds the input data for this script
import pandas as pd
df=dataset.copy(deep=True)
df['D']=df['D']/4
And there we go:
Now you have two different Python snippets that build on the same dataset. You still have the data from the first snippet, and you can do whatever you want with the second snippet without messing up your data source.
8: Insert another chart to verify:
9: Maybe have some fun with the whole thing by changing the source file:
Data:
A,B,C
100,10,100
2,20,200
3,30,150
New plots:

Color "Undefined" in Altair graph

I'm using the altair python library to create html files with vega-lite specifications. I'm encountering a problem where color is not being displayed in my plot. Here is the code:
import altair as alt
import pandas
data = 'Test.csv' #this contains three columns: Rating, Frequency, and typ, where 'typ' is either E or O.
a = alt.Chart(data).mark_bar().encode(
alt.X('Rating', type = 'ordinal'),
alt.Y('Frequency',type = 'quantitative'),
alt.Color('typ', type = 'nominal')
)
a.save('altairtest.html')
I get a graph without colors, and the legend comes up as 'typ' but only with blue, and text reading "undefined.
I am currently working locally on a SimpleHttpsServer. Could this be the reason why? For my purposes it is easier this way than using jupyter. Thanks
This usually indicates that there is an issue in your data file. I can reproduce your issue with a Data.csv file that looks like this:
Rating,Frequency, typ
0,1,O
1,2,E
Then the resulting chart looks like this:
Notice the space before typ in the header: spaces are important in CSV files: this means your column is named " typ" not "typ".
If you remove the space from the header in the CSV file, the same code gives you this:
Rating,Frequency,typ
0,1,O
1,2,E
Make certain your fields exactly match your data columns, and your chart should work as expected.

Keep a variable already created after a process is finished in python

I've build a GUI with wxPython in which I use a process to build a table to feed some charts when I click a button.
I build the table and I store it in to a variable to use the information to feed my matplotlib chart.
My problem is that when my chart is finished, based on the already constructed table stored in a variable and the process is finished, I loose the information of that variable and I need to use that same information to make my plot interactive (i.e. to change the plot from line to bar, or stacked or whatever), but the only way I've found is to re run the process to build the table over and over again.
Is there a way to use the stored information of that variable in other processes / modules / charts? I mean, is there a way to keep "active" my variable no matter the process where it was created was finished?
Thanks a lot for your guidance :)
This is done rather easily with the pickle module Here is a simple working example
from pickle import dumps, loads
a_variable = 15 # arbitrary value
with open("a_file.txt", "wb") as fileobj:
# create a pickle string representation of the data
fileobj.write(dumps(a_variable))
# Then to load it from another process
with open("a_file.txt", "rb") as fileobj:
# load the pickle string representation of the data
a_variable = loads(fileobj.read())

Categories

Resources