I am getting started with google facets, but I am finding the documentation insufficient.
I want to use facets dive to visualize images like they do here for cifar-10 . There is another example here using the Quick, draw! dataset.
However, I cannot find how to set it up.
That is about what I have so far, the code works all right:
from sklearn.datasets import fetch_mldata
from sklearn.decomposition import PCA
import pandas as pd
import numpy as np
from IPython.core.display import display, HTML
#load data:
mnist = fetch_mldata("MNIST original")
idx =np.random.randint(70000, 1000) #get random 1000 samples
X = mnist.data[idx]/255.0
y = mnist.target[idx]
#dimensionality reduction to get 3 features
pca = PCA(n_components=3)
pca_result = pca.fit_transform(X)
#put everything in a dataframe
df = df_pca = pd.DataFrame(data = pca_result, columns = [["pca-one", "pca-two", "pca-three"]])
df['y'] = y
#display facets dive
jsonstr = df.to_json(orient='records')
HTML_TEMPLATE = """<link rel="import" href="https://raw.githubusercontent.com/PAIR-code/facets/master/facets-dist/facets-jupyter.html">
<facets-dive id="elem" height="600"></facets-dive>
<script>
var data = {jsonstr};
document.querySelector("#elem").data = data;
</script>"""
html = HTML_TEMPLATE.format(jsonstr=jsonstr)
display(HTML(html))
This script works fine, but I just get circles with the labels (or whichever feature I choose), but I don't see how to integrate the actual images in there. The only hint I have so far is that I need the facets_atlasmaker for that, but I found the documentation rather insufficient.
If something is not clear, please let me know in the comments, I can try do add more relevant information then.
Related
Let's say that i have two 1-D arrays with 2 different statistical distributions. Now, i want to match both distributions using one of them as "target".
In the example, i "shifted" one of the distributions using MinMaxScaler() from SciKit to match it with the other one...but i am sure i can achieve a "automatic" and "better" match with some API...or some code...
In the example i have both arrays in the same DataFrame (and both have the same length), but i'd be very pleased if somebody kwnow a way to achieve it using 2 different Dataframes and/or 2 arrays with different lengths.
Thank you!!
CODE
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import plotly.figure_factory as ff
################## DATA ######################
np.random.seed(54)
crv = np.random.uniform(1,99,(1,100)).flatten()
np.random.seed(115)
crv_target = np.random.uniform(51,149,(1,100)).flatten()
# Create DataFrame
df = pd.DataFrame(data=[crv, crv_target]).T
df = df.rename(columns={0: "crv", 1: "crv_target"})
# Scaler
scale = MinMaxScaler(feature_range=(50,150))
df['crv_shifted'] = scale.fit_transform(X=df['crv'].values.reshape(-1, 1),y=df['crv_target'].values.reshape(-1, 1))
# Create distplot
data = [df['crv_shifted'],df['crv_target'],df['crv']]
labels = ['crv_shifted','crv_target','crv']
colors = ['#F8C471', '#22D2E6','#CD6155']
fig = ff.create_distplot(data, labels,show_hist=False,show_rug=False,colors=colors)
fig.show()
LINK TO PLOT
enter image description here
Here is my code and I have given an image of my dataset "Market_Basket_Optimisation". I have made list of lists transaction to give the input in apriori algorithm.But I am not getting the rules. I am new to machine learning and I am not able to find out the error.
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Data Preprocessing
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header = None)
transactions = []
for i in range(0, 7501):
transactions.append([str(dataset.values[i,j]) for j in range(0, 20)])
# Training Apriori on the dataset
from apyori import apriori
rules = apriori(transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2)
# Visualising the results
results = list(rules)
It is not clear from your question if you are using jupyter notebook or an IDE such as Spyder. If you are using an IDE such as Spyder, you are not likely to see the result unless you use a print statement. I suggest adding another line as follows:
print(resuult)
You should see the rules list. This is the same issue I had and using the print statement solved the problem for me. You will still need to define a function to output the result in a tabular format that makes sense.
I'm trying to get started with the statsmodel package to make qqplots. I installed from source using the master branch with python 3.6. For what I'd like to do I want to make a qqplot comparing two data distributions of different sample sizes. I'm trying to just run the example code they have in the documentation, but it's throwing an error about the different sample sizes.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import statsmodels.api as sm
from statsmodels.graphics.gofplots import qqplot
# example 6
x = np.random.normal(loc=8.25, scale=2.75, size=37)
y = np.random.normal(loc=8.75, scale=3.25, size=57)
pp_x = sm.ProbPlot(x, fit=True)
pp_y = sm.ProbPlot(y, fit=True)
fig = pp_x.qqplot(line='45', other=pp_y)
title = 'Ex. 6 - qqplot - compare different sample sizes'
h = plt.title(title)
plt.show()
I get this error:
ValueError: x and y must have same first dimension, but have shapes
(57,) and (37,)
Has anyone gotten this feature to work?
you should write
x = np.random.normal(loc=8.25, scale=2.75, size=(37,value-of-a-dimension-for-your-code)
like
x = np.random.normal(loc = 0, scale = 1, size = (3,3))
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'm currently teaching myself pandas and python for machine learning. I've done fine with text data thus far, but dealing with image data with limited knowledge of python and pandas is tripping me.
I have read in a .csv file into pandas dataframe, with one of its columns containing url to an image. So this is what shows when I get info from the dataframe.
dataframe = pandas.read_csv("./sample.csv")
dataframe.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total of 5 columns):
name 5000 non-null object
...
image 5000 non-null object
the image column contains url to the image. The problem is, I do not know how to import the image data from this and save it as numpy array for processing.
Any help is appreciated. Thanks in advance!
If you want to download the images from the web and then, for example, rotate your images from your dataframe, and save the results you can use the following code:
import pandas as pd
import matplotlib.pylab as plt
import numpy as np
from PIL import Image
import urllib2 as urllib
import io
df = pd.DataFrame({
"name": ["Butterfly", "Birds"],
"image": ["https://upload.wikimedia.org/wikipedia/commons/0/0c/Two-tailed_pasha_%28Charaxes_jasius_jasius%29_Greece.jpg",
'https://upload.wikimedia.org/wikipedia/commons/c/c5/Bat_cave_in_El_Maviri_Sinaloa_-_Mexico.jpg']})
def rotate_image(image, theta):
"""
3D rotation matrix around the X-axis by angle theta
"""
rotation_matrix = np.c_[
[1,0,0],
[0,np.cos(theta),-np.sin(theta)],
[0,np.sin(theta),np.cos(theta)]
]
return np.einsum("ijk,lk->ijl", image, rotation_matrix)
for i, imageUrl in enumerate(df.image):
print imageUrl
fd = urllib.urlopen(imageUrl)
image_file = io.BytesIO(fd.read())
im = Image.open(image_file)
im_rotated = rotate_image(im, np.pi)
fig = plt.figure()
plt.imshow(im_rotated)
plt.axis('off')
fig.savefig(df.name.ix[i] + ".jpg")
If instead you want to show the pictures you can do:
plt.show()
The resulting pictures are birds and butterfly which can be seen here as well:
As we don't know your csv-file, you have to tune your pd.read_csv() for your case.
Here i'm using requests to download some image in-memory.
These are then decoded with the help of scipy (which you already should have; if not: you can use Pillow too).
The decoded images are then raw numpy-arrays and shown by matplotlib.
Keep in mind, that we are not using temporary-files here and everything is hold in memory. Read also this (answer by jfs).
For people missing some required libs, one should be able to do the same with (code needs to be changed of course):
requests can be replaced with urllib (standard lib)
i'm not showing code, but this SO-question should be a good start
another relevant SO-question talking about in-memory processing with urllib
pandas can be replaced by csv (standard lib)
scipy can be replaced by Pillow (although internal storage might differ then)
matplotlib is just for demo-purposes (not sure if Pillow allows showing images; edit: it seems it can)
I just selected some random images from some german newspage.
Edit: Free images from wikipedia now used!
Code:
import requests # downloading images
import pandas as pd # csv- / data-input
from scipy.misc import imread # image-decoding -> numpy-array
import matplotlib.pyplot as plt # only for demo / plotting
# Fake data -> pandas DataFrame
urls_df = pd.DataFrame({'urls': ['https://upload.wikimedia.org/wikipedia/commons/thumb/c/cb/Rescue_exercise_RCA_2012.jpg/500px-Rescue_exercise_RCA_2012.jpg',
'https://upload.wikimedia.org/wikipedia/commons/thumb/3/31/Clinotarsus_curtipes-Aralam-2016-10-29-001.jpg/300px-Clinotarsus_curtipes-Aralam-2016-10-29-001.jpg',
'https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/US_Capitol_east_side.JPG/300px-US_Capitol_east_side.JPG']})
# Download & Decode
imgs = []
for i in urls_df.urls: # iterate over column / pandas Series
r = requests.get(i, stream=True) # See link for stream=True!
r.raw.decode_content = True # Content-Encoding
imgs.append(imread(r.raw)) # Decoding to numpy-array
# imgs: list of numpy arrays with varying shapes of form (x, y, 3)
# as we got 3-color channels
# Beware!: downloading png's might result in a shape of (x, y, 4)
# as some alpha-channel might be available
# For more options: https://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.imread.html
# Plot
f, arr = plt.subplots(len(imgs))
for i in range(len(imgs)):
arr[i].imshow(imgs[i])
plt.show()
Output:
General Overview:
I am creating a graph of a large data set, however i have created a sample text document so that it is easier to overcome the problems.
The Data is from an excel document that will be saved as a CSV.
Problem:
I am able to compile the data a it will graph (see below) However how i pull the data will not work for all of the different excel sheet i am going to pull off of.
More Detail of problem:
The Y-Values (Labeled 'Value' and 'Value1') are being pulled for the excel sheet from the numbers 26 and 31 (See picture and Code).
This is a problem because the Values 26 and 31 will not be the same for each graph.
Lets take a look for this to make more sense.
Here is my code
import pandas as pd
import matplotlib.pyplot as plt
pd.read_csv('CSV_GM_NB_Test.csv').T.to_csv('GM_NB_Transpose_Test.csv,header=False)
df = pd.read_csv('GM_NB_Transpose_Test.csv', skiprows = 2)
DID = df['SN']
Value = df['26']
Value1 = df['31']
x= (DID[16:25])
y= (Value[16:25])
y1= (Value1[16:25])
"""
print(x,y)
print(x,y1)
"""
plt.plot(x.astype(int), y.astype(int))
plt.plot(x.astype(int), y1.astype(int))
plt.show()
Output:
Data Set:
Below in the comments you will find the 0bin to my Data Set this is because i do not have enough reputation to post two links.
As you can see from the Data Set
X- DID = Blue
Y-Value = Green
Y-Value1 = Grey
Troublesome Values = Red
The problem again is that the data for the Y-Values are pulled from Row 10&11 from values 26,31 under SN
Let me know if more information is needed.
Thank you
Not sure why you are creating the transposed CSV version. It is also possible to work directly from your original data. For example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('CSV_GM_NB_Test.csv', skiprows=8)
data = df.ix[:,19:].T
data.columns = df['SN']
data.plot()
plt.show()
This would give you:
You can use pandas.DataFrame.ix() to give you a sliced version of your data using integer positions. The [:,19:] says to give you columns 19 onwards. The final .T transposes it. You can then apply the values for the SN column as column headings using .columns to specify the names.