Python Altair Load from JSON - python

looking for help on how to load an altair chart from a json file (contains dict_keys(['$schema', 'config', 'datasets', 'title', 'vconcat'])).
The json file was created using the altair.Chart method to_json() such as below:
import altair as alt
chart = alt.Chart(df).mark_line(...).encode(...).properties(...).transform_filter(...)
chart_json = chart.to_json()
Here is a sample of the code I would like to run
chart = alt.load_json(chart_json) # made-up, needs replacement
chart.save('chart.png')
Disclaimer: I've never used altair and am trying to reverse-engineer a project. Thanks in advance for the help!

Altair can work directly with json files by specifying their path/url as shown in the documentation:
import altair as alt
from vega_datasets import data
url = data.cars.url # URL/path to json data
alt.Chart(url).mark_point().encode(
x='Horsepower:Q',
y='Miles_per_Gallon:Q'
)
To load a local file, there are several potential issues, including the browser having access to local files and using the correct format for the frontend you're using (e.g. jupyterlab). These have been elaborated on elsewhere:
Reference local .csv file in altair chart
https://github.com/altair-viz/altair/issues/2529#issuecomment-982643293
https://github.com/altair-viz/altair/issues/2432

It sounds like you're looking for the alt.Chart.from_json method:
new_chart = alt.Chart.from_json(chart_json)
new_chart.display()
The method assumes that chart_json is a string of JSON representing a valid Altair chart.

Related

Import csv from Kaggle url into a pandas DataFrame

I want to import a public dataset from Kaggle (https://www.kaggle.com/unsdsn/world-happiness?select=2017.csv) into a local jupyter notebook. I don't want to use any credencials in the process.
I saw diverse solutions including: pd.read_html, pd.read_csv, pd.read_table (pd = pandas).
I also found the solutions that imply a login.
The first set of solutions are the ones I am interested in, though I see that they work on other websites because there is a link to the raw data.
I have been clincking everywhere in the kaggle interface but find no direct url to raw data.
Bottom line: Is it possible to use say pd.read_csv to directly get data from the website into your local notebook? If so, how?
You can automate kaggle.cli
follow the instructions to download and save kaggle.json for authentication https://github.com/Kaggle/kaggle-api
import kaggle.cli
import sys
import pandas as pd
from pathlib import Path
from zipfile import ZipFile
# download data set
# https://www.kaggle.com/unsdsn/world-happiness?select=2017.csv
dataset = "unsdsn/world-happiness"
sys.argv = [sys.argv[0]] + f"datasets download {dataset}".split(" ")
kaggle.cli.main()
zfile = ZipFile(f"{dataset.split('/')[1]}.zip")
dfs = {f.filename:pd.read_csv(zfile.open(f)) for f in zfile.infolist() }
dfs["2017.csv"]

Open plotly html file from script Python/R

I have a plotly graph stored in a html file, file.html. This was created in another script using for example:
import plotly.graph_objects as go
fig = go.Figure(data=go.Bar(y=[2, 3, 1]))
fig.write_html('file.html', auto_open=True)
Is there a way to open directly the html file without rewrite the plotly code?
Something like:
fig = go.read_html('file.html')
I need the plot inside the variable. For example this has to work:
fig = read html file file.html
fig.write_html('copyOfFile.html', auto_open=True)
I use both Python and R so I'd like a solution for both/one of them.
You can use python standard library webbrowser. It uses default browser to open the document.
import webbrowser
webbrowser.open('file.html')

Interactive visualization - select which csv to visualize

I'm writing an interactive visualization code using Python.
What i would like to do is to create an interactive visualization which allows the user to select a file from a dropdown menu (or something like that) and then plot a barplot of the selected data.
My data folder has the following structure:
+-- it_features
| +-- it_2017-01-20--2017-01-27.csv
| +-- it_2017-01-27--2017-02-03.csv
| +-- it_2017-02-03--2017-02-10.csv
and so on (there are many more files, I'm just reporting few of them for simplicity).
So far I'm able to access and retrieve all the data contained in the folder:
import os
import pandas as pd
path = os.getcwd()
file_folder = os.path.join(path,'it_features')
for csv_file in os.listdir(file_folder):
print(csv_file)
file = os.path.join(file_folder,csv_file)
df = pd.read_csv(file)
#following code....
What I would like to do is create an insteractive visualization which allows the user to select the file name (for example it_2017-02-03--2017-02-10.csv) and plot the data of that file.
I'm able to select "by hand" the file I want and plot its data by inserting its filename in a variable and then retrieving the data, but I would like not to insert it via code and allow the final user to browse and select one of the files using a dropdown menu or something similar.
My simple code:
import os
import pandas as pd
path = os.getcwd()
file_folder = os.path.join(path,'it_features')
file = os.path.join(file_folder,'it_2020-02-07--2020-02-14.csv') # Here I insert my filename
df=pd.read_csv(file)
ax=df.value_counts(subset=['Artist']).head(10).plot(y='number of songs',kind='bar', figsize=(15, 7), title="7-14 February 2020")
ax.set_xlabel("Artist")
ax.set_ylabel("Number of Songs Top 200")
Which generates the following plot:
As I already said, I would like to introduce a somewhat drodown menu that allows the user to select the csv data he wants to plot using an interactive plot.
I saw that it's possible to create dropdown menus with Plotly, but in the various examples (https://plotly.com/python/dropdowns/) it doesn't seem to select and then load the data.
I also saw this code (Kaggle code) which seems to do what I wanted to do: you can select the region and plot the data from that region.
The main problem is that he just creates a big unique dataframe with US states, and then creates a trace for each one of them.
What i would like to do (if possible) is to select the file name from the dropdown, load the csv and then plot its data, without creating a single giant dataframe with all my files in it.
Is it possible?
EDIT: The solution proposed by gherka works perfectly, but I would like to have a solution inside Plotly using its dropdown menu.
Since you're working in Jupyter Notebook, you have a number of different options.
Some visualisation libraries will have built-in widgets that you can use, however they would often require you to run a server or provide a javascript callback. For a library-agnostic approach, you can use ipywidgets. This library is specifically for creating widgets to be used in Jupyter Notebooks. The documentation is here.
To create a simple dropdown with a static bar plot underneath, you would need three widgets - Label for dropdown description, Dropdown and Output. VBox is for laying them out.
from ipywidgets import VBox, Label, Dropdown, Output
desc = Label("Pick a .csv to plot:")
dropdown = Dropdown(
options=['None', 'csv1', 'csv2', 'csv3'],
value='None',
disabled=False)
output = Output()
dropdown.observe(generate_plot, names="value")
VBox([desc, dropdown, output])
The key element is the generate_plot function. It must have a single parameter that you use to decide what effect the widget action has on your plot. When you interact with the dropdown, the generate_plot function will be called and passed a dictionary with "new" value, "old" value and a few other things.
Here's a function to generate a basic seaborn bar chart with an adjustable data source. Notice I had to include an explicit plt.show() - plots won't render otherwise.
def generate_plot(change):
with output:
output.clear_output() # reset the view
if change["new"] != "None":
data = pd.read_csv(...) # your custom code based on dropdown selection
sns.catplot(x="Letters", y="Numbers", kind="bar", data=data)
fig = plt.figure()
plt.show(fig)
If you have many large .csv files, one other thing is you might want to do is implement a caching system so that you keep the last few user selections in memory and avoid re-reading them on each selection.
For a more in-depth look at how to add interactivity to matplotlib plots using ipywidgets I found this tutorial quite useful.
tkinter is a super common UI framework for python, and is part of the standard library. Based on answers in a similar question, you can use this:
from tkinter.filedialog import askopenfilename
filename = askopenfilename()
which pops up a standard file explorer window.

Color "Undefined" in Altair graph

I'm using the altair python library to create html files with vega-lite specifications. I'm encountering a problem where color is not being displayed in my plot. Here is the code:
import altair as alt
import pandas
data = 'Test.csv' #this contains three columns: Rating, Frequency, and typ, where 'typ' is either E or O.
a = alt.Chart(data).mark_bar().encode(
alt.X('Rating', type = 'ordinal'),
alt.Y('Frequency',type = 'quantitative'),
alt.Color('typ', type = 'nominal')
)
a.save('altairtest.html')
I get a graph without colors, and the legend comes up as 'typ' but only with blue, and text reading "undefined.
I am currently working locally on a SimpleHttpsServer. Could this be the reason why? For my purposes it is easier this way than using jupyter. Thanks
This usually indicates that there is an issue in your data file. I can reproduce your issue with a Data.csv file that looks like this:
Rating,Frequency, typ
0,1,O
1,2,E
Then the resulting chart looks like this:
Notice the space before typ in the header: spaces are important in CSV files: this means your column is named " typ" not "typ".
If you remove the space from the header in the CSV file, the same code gives you this:
Rating,Frequency,typ
0,1,O
1,2,E
Make certain your fields exactly match your data columns, and your chart should work as expected.

GeoJSON data not displaying in Python folium map

I am trying to display the following geojson file in a folium map in Python but it just shows an empty map with none of the data.
Here are the steps I have tried:
I tried using the python code below but nothing shows up.
I tried other geojson files in the github repository below using the same code and the data show up without any issue, so it looks like my python code is fine
I opened the "census_tracts_2010.geojson" file in github and Mapshaper, the data showed up perfectly without any issue, so it doesn't look like the geojson file is corrupted
Could anyone please let me know how I can fix it?
Geojson file:
https://github.com/dwillis/nyc-maps/blob/master/census_tracts_2010.geojson
Python code:
import folium
m = folium.Map(location=[40.66393072,-73.93827499], zoom_start=13)
m.choropleth(geo_path="census_tracts_2010.geojson")
m.save(outfile='datamap.html')
Thanks a lot!
That file is not a GeoJson it is a TopoJson. You need to use folium.TopoJson instead.
import folium
m = folium.Map(location=[40.66393072,-73.93827499], zoom_start=13)
folium.TopoJson(
open('census_tracts_2010.geojson'),
object_path='objects.nyct2010',
).add_to(m)
m
You need to open the geojson file.
m.choropleth(open("census_tracts_2010.geojson"))
Take a look at the examples https://folium.readthedocs.io/en/latest/quickstart.html
Try this: m.add_child(folium.GeoJson(data = open("census_tracts_2010.geojson"))) and then call m.save() fun

Categories

Resources