Color "Undefined" in Altair graph - python

I'm using the altair python library to create html files with vega-lite specifications. I'm encountering a problem where color is not being displayed in my plot. Here is the code:
import altair as alt
import pandas
data = 'Test.csv' #this contains three columns: Rating, Frequency, and typ, where 'typ' is either E or O.
a = alt.Chart(data).mark_bar().encode(
alt.X('Rating', type = 'ordinal'),
alt.Y('Frequency',type = 'quantitative'),
alt.Color('typ', type = 'nominal')
)
a.save('altairtest.html')
I get a graph without colors, and the legend comes up as 'typ' but only with blue, and text reading "undefined.
I am currently working locally on a SimpleHttpsServer. Could this be the reason why? For my purposes it is easier this way than using jupyter. Thanks

This usually indicates that there is an issue in your data file. I can reproduce your issue with a Data.csv file that looks like this:
Rating,Frequency, typ
0,1,O
1,2,E
Then the resulting chart looks like this:
Notice the space before typ in the header: spaces are important in CSV files: this means your column is named " typ" not "typ".
If you remove the space from the header in the CSV file, the same code gives you this:
Rating,Frequency,typ
0,1,O
1,2,E
Make certain your fields exactly match your data columns, and your chart should work as expected.

Related

Python Altair Load from JSON

looking for help on how to load an altair chart from a json file (contains dict_keys(['$schema', 'config', 'datasets', 'title', 'vconcat'])).
The json file was created using the altair.Chart method to_json() such as below:
import altair as alt
chart = alt.Chart(df).mark_line(...).encode(...).properties(...).transform_filter(...)
chart_json = chart.to_json()
Here is a sample of the code I would like to run
chart = alt.load_json(chart_json) # made-up, needs replacement
chart.save('chart.png')
Disclaimer: I've never used altair and am trying to reverse-engineer a project. Thanks in advance for the help!
Altair can work directly with json files by specifying their path/url as shown in the documentation:
import altair as alt
from vega_datasets import data
url = data.cars.url # URL/path to json data
alt.Chart(url).mark_point().encode(
x='Horsepower:Q',
y='Miles_per_Gallon:Q'
)
To load a local file, there are several potential issues, including the browser having access to local files and using the correct format for the frontend you're using (e.g. jupyterlab). These have been elaborated on elsewhere:
Reference local .csv file in altair chart
https://github.com/altair-viz/altair/issues/2529#issuecomment-982643293
https://github.com/altair-viz/altair/issues/2432
It sounds like you're looking for the alt.Chart.from_json method:
new_chart = alt.Chart.from_json(chart_json)
new_chart.display()
The method assumes that chart_json is a string of JSON representing a valid Altair chart.

Interactive visualization - select which csv to visualize

I'm writing an interactive visualization code using Python.
What i would like to do is to create an interactive visualization which allows the user to select a file from a dropdown menu (or something like that) and then plot a barplot of the selected data.
My data folder has the following structure:
+-- it_features
| +-- it_2017-01-20--2017-01-27.csv
| +-- it_2017-01-27--2017-02-03.csv
| +-- it_2017-02-03--2017-02-10.csv
and so on (there are many more files, I'm just reporting few of them for simplicity).
So far I'm able to access and retrieve all the data contained in the folder:
import os
import pandas as pd
path = os.getcwd()
file_folder = os.path.join(path,'it_features')
for csv_file in os.listdir(file_folder):
print(csv_file)
file = os.path.join(file_folder,csv_file)
df = pd.read_csv(file)
#following code....
What I would like to do is create an insteractive visualization which allows the user to select the file name (for example it_2017-02-03--2017-02-10.csv) and plot the data of that file.
I'm able to select "by hand" the file I want and plot its data by inserting its filename in a variable and then retrieving the data, but I would like not to insert it via code and allow the final user to browse and select one of the files using a dropdown menu or something similar.
My simple code:
import os
import pandas as pd
path = os.getcwd()
file_folder = os.path.join(path,'it_features')
file = os.path.join(file_folder,'it_2020-02-07--2020-02-14.csv') # Here I insert my filename
df=pd.read_csv(file)
ax=df.value_counts(subset=['Artist']).head(10).plot(y='number of songs',kind='bar', figsize=(15, 7), title="7-14 February 2020")
ax.set_xlabel("Artist")
ax.set_ylabel("Number of Songs Top 200")
Which generates the following plot:
As I already said, I would like to introduce a somewhat drodown menu that allows the user to select the csv data he wants to plot using an interactive plot.
I saw that it's possible to create dropdown menus with Plotly, but in the various examples (https://plotly.com/python/dropdowns/) it doesn't seem to select and then load the data.
I also saw this code (Kaggle code) which seems to do what I wanted to do: you can select the region and plot the data from that region.
The main problem is that he just creates a big unique dataframe with US states, and then creates a trace for each one of them.
What i would like to do (if possible) is to select the file name from the dropdown, load the csv and then plot its data, without creating a single giant dataframe with all my files in it.
Is it possible?
EDIT: The solution proposed by gherka works perfectly, but I would like to have a solution inside Plotly using its dropdown menu.
Since you're working in Jupyter Notebook, you have a number of different options.
Some visualisation libraries will have built-in widgets that you can use, however they would often require you to run a server or provide a javascript callback. For a library-agnostic approach, you can use ipywidgets. This library is specifically for creating widgets to be used in Jupyter Notebooks. The documentation is here.
To create a simple dropdown with a static bar plot underneath, you would need three widgets - Label for dropdown description, Dropdown and Output. VBox is for laying them out.
from ipywidgets import VBox, Label, Dropdown, Output
desc = Label("Pick a .csv to plot:")
dropdown = Dropdown(
options=['None', 'csv1', 'csv2', 'csv3'],
value='None',
disabled=False)
output = Output()
dropdown.observe(generate_plot, names="value")
VBox([desc, dropdown, output])
The key element is the generate_plot function. It must have a single parameter that you use to decide what effect the widget action has on your plot. When you interact with the dropdown, the generate_plot function will be called and passed a dictionary with "new" value, "old" value and a few other things.
Here's a function to generate a basic seaborn bar chart with an adjustable data source. Notice I had to include an explicit plt.show() - plots won't render otherwise.
def generate_plot(change):
with output:
output.clear_output() # reset the view
if change["new"] != "None":
data = pd.read_csv(...) # your custom code based on dropdown selection
sns.catplot(x="Letters", y="Numbers", kind="bar", data=data)
fig = plt.figure()
plt.show(fig)
If you have many large .csv files, one other thing is you might want to do is implement a caching system so that you keep the last few user selections in memory and avoid re-reading them on each selection.
For a more in-depth look at how to add interactivity to matplotlib plots using ipywidgets I found this tutorial quite useful.
tkinter is a super common UI framework for python, and is part of the standard library. Based on answers in a similar question, you can use this:
from tkinter.filedialog import askopenfilename
filename = askopenfilename()
which pops up a standard file explorer window.

How to sum up yes and no into a total quantity, using matplotlib, pandas, python from a CSV import to plot a graph

[m ][1]
QUESTION #1) I am new to python and coding in general. I want to take my data from a CSV which has a column labeled "U.S. OSHA Recordable?". In that column every answer is either "yes" or "no". I want to display a plot.bar that shows "23 yes's" and "7 No's". Essentially adding up the total of "yes's" and "no's in the column, then displaying the total in 1 clean bar graphs. It will display 2 bars with the total number on top of both bars.... The problem is, the bar graph has a single line on the X axis right now and each line says "no, yes, no, yes, yes, no" about 27 individual times. I want the users to easily see 1 bar graph showing only 2 bars with the total on top like this image.
This is my code, I am not sure what i would need to sum up the Yes and No in the column.
import pandas as pd # powerful data visualization library
import numpy as np
import matplotlib.pyplot as plt # allows us to plot things
import csv # allows us to import and use CSV commands which are simple but effective
data = pd.read_csv(r'C:\Users\rmond\Downloads\PS_csvFile.csv', encoding="ISO-8859-1", skiprows=6) #skiprows allows you to skip the comments on top... & ecoding allows pandas to work on this CSV
data.head() # this will give the first row that you want it to read the header
data.plot.bar(x='U.S. OSHA Recordable?') #creates a plot in pandas
plt.show() # shows the plot to the user
df['Val'].value_counts().plot(kind='bar')
Here Val is the name of the column that contains 'Yes' & 'No'
import pandas as pd # powerful data visualization library
import numpy as np
import matplotlib.pyplot as plt # allows us to plot things
import csv # allows us to import and use CSV commands which are simple but effective
import seaborn as sns # it counts everything for you and outputs it exactly like I want
# This website saved my life https://www.pythonforengineers.com/introduction-to-pandas/
# use this to check the available styles: plt.style.available
data = pd.read_csv(r'C:\Users\rmond\Downloads\PS_csvFile.csv', encoding="ISO-8859-1", skiprows=6) #skiprows allows you to skip the comments on top... & ecoding allows pandas to work on this CSV
sns.set(style="whitegrid")
ax = sns.countplot(x='U.S. OSHA Recordable?', data=data)
plt.show() # shows the plot to the user
So interestingly enough I found out about "seaborn" I pip installed it and gave it a shot. It is supposed to pull data from a URL, but after viewing a few other pages on stack overflow I found a great suggestion. Anyways, this works great and it does everything for me. I am so happy with this solution. Now onto the next problem lol. I hope this helps someone else in the future.
My graph looks exactly like the one posted by SH-SF btw. Works great

Creating a kml file for google earth - Heat/colour map required

I have csv data containing latitude and longitude. I wish to create a 'heatmap' whereby some placemarks are certain colours based on values corresponding to each lat+long point in the csv.
import simplekml
import csv
kml = simplekml.Kml()
kml.document.name = "Test"
with open('final.csv', 'rb') as f:
reader = csv.reader(f)
first_row = reader.next() # removes csv header string
long = col[1]
lat = col[2]
for col in reader:
pnt = kml.newpoint()
pnt.name = 'test-name'
pnt.coords = [(long, lat, 10000)]
pnt.description = col[0] # timestamp data
pnt.style.labelstyle.color = 'ff0000ff'
kml.save("test.kml")
This script creates a kml file that on inspection with google earth presents the data points but I want some kind of graphical input too.
It doesn't seem like simplekml package supports such things.. Any advice on what python package is best to get a 'heatmap' or something along those lines? Or even I can add kml elements directly in the script but the documentation it seems is rather limited for the solutions I require.
Thanks
Sounds like you're looking to create a "thematic map", not a "heatmap".
Looks like you're using pnt.style.labelstyle.color = ... to add colors. LabelStyle refers to the text labels associated with icons, not the icons themselves. What you probably want is to refer to differently colored icons based on your attribute values. You should be able to do that with: pnt.style.iconstyle.icon.href = .... Either specify a colored icon image, or use a white image and apply a color with style.iconstyle.color = ....
Best practice for KML would be to set up several shared styles, one for each icon color, and then apply the relevant style to each point with pnt.style = ....
Implementation details are in the simplekml documentation.

Plotly graph streaming, getting data from local text file creates weird behavior

I used the plotly streaming API from Python plot.ly/python/streaming-tutorial ) to set up a dashboard with graphs showing data streamed from local logfiles (.txt).
I followed the tutorial to create a graph of a data stream; reproducing the "Getting started" example worked fine (although i had to change the py.iplot() into py.plot()).
I made some small modifications to the working example code to get the Y-axis value from a text file on my local drive. It does manage to plot the value written in my text file on the graph and even update it as I modify the value in the text file, but it behaves differently than the graph produced by the example code for a streamed plotly graph. I include both my code for the "Example" graph and my code for the "Data from local text file" graph; and images of the different behaviors.
The first two images show the Plot and Data produced by the "Example" code and the last two for the "Data from local text file" code. : http://imgur.com/a/ugo6m
The interesting thing here is that in the first case (Example), the updated value of Y is shown on a new line in the Data tab. This is not the case in the second case (Data from local text file). In the second case, the Y value is updated, but always takes the place of the first line. Instead of adding a new Y point and storing the previous one, it just constantly modifies the first value that Y received. I think the problem comes from there.
Here's a link for both codes, they're short and only the last few lines matter, as I suppose the problem comes from there since they're the only difference between both codes. I tried different working expressions to read the value from the text file ("with open('data.txt', 'r')) but nothing does it. Does anyone know how to make it work properly?
(!!!Careful both codes run an infinite loop!!!)
"Example": http://pastebin.com/6by30ANs
"Data from local text file": see below
Thanks in advance for your time,
PS: I had to put my second code here below as I do not have enough reputation to put more than 2 links.
import plotly.plotly as py
import plotly.tools as tls
import plotly.graph_objs as go
import datetime
import time
tls.set_credentials_file(username='lo.dewaele', stream_ids = ['aqwoq4i2or'], api_key='PNASXMZQmLmAVLtthYq2')
stream_ids = tls.get_credentials_file()['stream_ids']
stream_id = stream_ids[0]
stream_1 = dict(token=stream_id, maxpoints=20)
trace1 = go.Scatter(
x=[],
y=[],
mode='lines+markers',
stream=stream_1
)
data = go.Data([trace1])
layout = go.Layout(title='Time Series')
fig = go.Figure(data=data, layout=layout)
py.plot(fig, filename='stream')
s = py.Stream(stream_id)
s.open()
time.sleep(1)
while True:
graphdata = open('graphdata.txt', 'r') #open file in read mode
y = [graphdata.readline()] #read the first line of the file (just one integer)
x = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')
s.write(dict(x=x, y=y))
graphdata.close() #close the file
time.sleep(1)
s.close()

Categories

Resources