I am working with the classic titanic dataset and trying to apply NNs. My data comes already split into train and dev sets. However, I want to merge the datasets together for many things (for example, my own splitting, etc..)
Is there a way I can merge both datasets?
I have looked around and only found information about how to split a dataset, but I was unable to find how to merge them back together.
Any help?
A MWE is provided below!
from __future__ import absolute_import,division,print_function,unicode_literals
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.display import clear_output
from six.moves import urllib
import tensorflow.compat.v2.feature_column as fc
import tensorflow as tf
import seaborn as sns
# URL address of data
TRAIN_DATA_URL = "https://storage.googleapis.com/tf-datasets/titanic/train.csv"
TEST_DATA_URL = "https://storage.googleapis.com/tf-datasets/titanic/eval.csv"
# Downloading data
train_file_path = tf.keras.utils.get_file("train.csv", TRAIN_DATA_URL)
test_file_path = tf.keras.utils.get_file("eval.csv", TEST_DATA_URL)
# Reading data
data_train = pd.read_csv(train_file_path)
data_test = pd.read_csv(test_file_path)
MY_DATA= MERGE HERE????? # merge(data_train,data_test)??
I assume data_train and data_test have the same number of columns and the column names are the same. Then just do
merged_df= pd.concat([data_train, data_test], axis=0)
Related
So I was trying to open image files in google collab. My target is to apply FFT for image processing. Here i have files labelled as array values. How exactly would I access an image instead of the numerical values.
!nvidia-smi
!nvidia-smi
!nvidia-smi
!pip install tensorflow-gpu
!pip install tensorflow_hub
from __future__ import absolute_import, division, print_function, unicode_literals
import matplotlib.pylab as plt
import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import pandas as pd
import os
import cv2
from google.colab import drive
drive.mount('/content/drive')
data_root='/content/drive/My Drive/ML/AlpDatabase/DATASET'
import numpy as np
from matplotlib import pyplot as plt
data = tf.keras.utils.image_dataset_from_directory('/content/drive/My Drive/ML/AlpDatabase/DATASET')
data_iterator = data.as_numpy_iterator()
batch = data_iterator.next()
batch[0].shape
fig, ax = plt.subplots(ncols=4, figsize=(20,20))
for idx, img in enumerate(batch[0][:4]):
ax[idx].imshow(img.astype(int))
ax[idx].title.set_text(batch[1][idx])
so if I had to print a certain image from data set how would I accomplish that as my goal is to send the image files to my FFT algorithm and then create a new dataset after processing. My goal here was to basically read image files from my drive folder maintain classes.
My drive folder has DATASET folder containing A,B,C,D.... etc folders each with many images. Basically an image dataset of alphabets.
If I understood correctly - You want to basically read image files from the drive folder maintain classes.
The code mentioned above is working fine to do the same when I tried replicating the issue. Please check this gist for your reference.
You also can use the below code to display images after fetching from the directory where class_names will be your DATASET inside folder names (A,B,C,D...).
data = tf.keras.utils.image_dataset_from_directory('/content/drive/My Drive/MY WORK/dataset/flowers')
class_names = data.class_names
print(class_names)
Output:
['daisy', 'dandelion', 'rose', 'sunflower', 'tulip'] #in your case ['A', 'B', 'C', 'D'...
To display images:
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 10))
for images, labels in data.take(1):
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(class_names[labels[i]])
plt.axis("off")
Output:
Really difficult to find anyone using markdown in a python IDE (I am using pycharm), with both R and python chunks.
Here is my code so far; I am just trying to set up my markdown to use both R and python code; it seems like my python chunk doesn't work; any idea why? Thanks!
R environment
library(readODS) # excel data
library(glmmTMB) # mixed models
library(car) # ANOVA on mixed models
library(DHARMa) # goodness of fit of the model
library(emmeans) # post hoc
library(ggplot2) # plots
library(reticulate) # link between R and python
use_python('C:/Users/saaa/anaconda3/envs/Python_projects/python.exe')
Python environment
import pandas as pd
import os
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
I am doing a project in which I need to estimate the age of an individual, given an X-Ray of their hand. I am given a testing set, which contains a large collection of images (in a folder on my computer), all NUMBERED, and I am also given a CSV file that corresponds each image number with 2 pieces of information: the age(in months), as well as whether the individual is male (this is given as "true" or "false." Also, I believe I have successfully imported both of these files into python(the image folder, as well as the CSV file)
I have looked at many TensorFlow tutorials, but I am struggling to figure out how I can associate the image numbers together, as well as train the data set. Any help would be greatly appreciated!!
I have attached blocks of my code, as well as how the data is presented to me, up until this point.
import pandas as pd
import numpy as np
import os
import tensorflow as tf
import cv2
from tensorflow import keras
from tensorflow.keras.layers import Dense, Input, InputLayer, Flatten
from tensorflow.keras.models import Sequential, Model
from matplotlib import pyplot as plt
import matplotlib.image as mpimg
import random
%matplotlib inline
import matplotlib.pyplot as plt
--This simply imports libraries that I use, or anticipate using later on.
plt.figure(figsize=(20,20))
train_images=r'/Users/FOLDER/downloads/Boneage_competition/training_dataset/boneage-training-dataset'
for i in range(5):
file = random.choice(os.listdir(train_images))
image_path= os.path.join(train_images, file)
img=mpimg.imread(image_path)
ax=plt.subplot(1,5,i+1)
ax.title.set_text(file)
plt.imshow(img)
-- This successfully imports the image folder, as well as prints 5 random images to test if the importing worked.
This screenshot provides an example of how the pictures are depicted
IMG_WIDTH=200
IMG_HEIGHT=200
img_folder=r'/Users/FOLDER/downloads/Boneage_competition/training_dataset/'
-- I believe this resizes all the images to the specified dimensions
label_file = '/Users/FOLDER/downloads/train.csv'
train_labels = pd.read_csv (r'/Users/FOLDER/downloads/train.csv')
print (train_labels)
-- This successfully imports the data from the CSV file, and prints it, to make sure it worked.
If you have any ideas on how to connect these two datasets and train the data, I would greatly appreciate it.
Thank you!
The approach is simple create a map between the image_data and the label. After that you can create two lists/np.array and use the same to pass the train and label info to you model. Following code should help in getting the same.
import os
import glob
dic = {}
# assuming you have .png format files else change the same into the glob statement
train_images='/Users/FOLDER/downloads/Boneage_competition/training_dataset/boneage-training-dataset'
for file in glob.glob(train_images+'/*.png'):
b_name = os.path.basename(file).split('.')[0]
dic[b_name] = mpimg.imread(file)
dic_label_match = {}
label_file = '/Users/FOLDER/downloads/train.csv'
train_labels = pd.read_csv (r'/Users/rkrishna/downloads/train.csv')
for i in range(len(train_labels)):
# given your first column is age and image no starts from 1
dic_label_match[i+1] = str(train_labels.iloc[i][0])
# you can use the below line too
# dic_label_match[i+1] = str(train_labels.iloc[i][age])
# now you have dict with keys and values
# create two lists / arrays and you can pass the same to the keram model
train_x = []
label_ = []
for val in dic:
if val in dic and val in dic_label_match:
train_x.append(dic[val])
label_.append(dic_label_match[val])
I'm trying to plot some meteorological data in NetCDF format accessed via the Unidata siphon package.
I've imported what the MetPy docs suggest are the relevant libraries
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
from netCDF4 import num2date
import numpy as np
import xarray as xr
from siphon.catalog import TDSCatalog
from datetime import datetime
import metpy.calc as mpcalc
from metpy.units import units
and I've constructed a query for data as per the Siphon docs
best_gfs = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_0p25deg/catalog.xml?dataset=grib/NCEP/GFS/Global_0p25deg/Best')
best_ds = best_gfs.datasets[0]
ncss = best_ds.subset()
query = ncss.query()
query.lonlat_box(north=55, south=20, east=-60, west=-90).time(datetime.utcnow())
query.accept('netcdf4')
query.variables('Vertical_velocity_pressure_isobaric','Relative_humidity_isobaric','Temperature_isobaric','u-component_of_wind_isobaric','v-component_of_wind_isobaric','Geopotential_height_isobaric')
data = ncss.get_data(query)
Unfortunately, when I attempt to parse the dataset using the code from the Metpy docs
data = data.metpy.parse_cf()
I get an error: "AttributeError: NetCDF: Attribute not found"
When attempting to fix this problem, I came across another SO post that seems to have the same issue, but the solution suggested there -- to update my metpy to the latest version, -- did not work for me. I updated metpy using Conda but got the same problem as before I updated. Any other ideas on how to get this resolved?
Right now the following code in Siphon
data = ncss.get_data(query)
will return a Dataset object from netcdf4-python. You need one extra step to hand this to xarray, which will make MetPy's parse_cf available:
from xarray.backends import NetCDF4DataStore
ds = xr.open_dataset(NetCDF4DataStore(data))
data = ds.metpy.parse_cf()
Code:
from sklearn import svm
import numpy as np
from sklearn import model_selection
import matplotlib.pyplot as plt
import matplotlib as mpl
from matplotlib import colors
def iris_type(s):
class_label={b'Iris-setosa':0,b'Iris-versicolor':1,b'Iris-virginica':2}
return class_label[s]
filepath='E:\dataset\IRIS\IRIS.csv' # the path of dataset
data=np.loadtxt(filepath,dtype=float,delimiter=',',converters={4:iris_type})
When I try to transform species of IRIS, the returned error is:
enter image description here
The data-set I use is downloaded from kaggle.
enter image description here
I'd like to format species to a float.