How to plot scatterplot using matplotlib from arrays (using strings)? Python - python

I have been trying to plot a 3D scatterplot from a pandas array (I have tried to convert the data over to numpy arrays and strings to put into the system). However, the error ValueError: s must be a scalar, or float array-like with the same size as x and y keeps popping up. My data for Patient ID is in the format of EMR-001, EMR-002 etc after blanking it out. My data for Discharge Date is converted to become a string of numbers like 20200120. My data for Diagnosis Code is a mix of characters like 001 or 10B.
I have also tried to look online at some of the other examples but have not been able to identify any areas. Could I seek your advice for anything I missed out or code I can input?
I'm using Python 3.9, UTF-8. Thank you in advanced!
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
#importing csv data via pd
A = pd.read_csv('input.csv') #import file for current master list
Diagnosis_Des = A["Diagnosis Code"]
Discharge_Date = A["Discharge Date"]
Patient_ID = A["Patient ID"]
B = Diagnosis_Des.to_numpy()
#B1 = np.array2string(B)
#print(B.shape)
C = Discharge_Date.to_numpy() #need to change to data format
#C1 = np.array2string(C)
#print(C1)
D = Patient_ID.to_numpy()
#D1 = np.array2string(D)
#print(D.shape)
from matplotlib import pyplot
from mpl_toolkits.mplot3d import Axes3D
sequence_containing_x_vals = D
sequence_containing_y_vals = B
print(type(sequence_containing_y_vals))
sequence_containing_z_vals = C
print(type(sequence_containing_z_vals))
plt.scatter(sequence_containing_x_vals, sequence_containing_y_vals, sequence_containing_z_vals)
pyplot.show()

Related

TypeError: '.dt' accessor only available for DataArray with datetime64 timedelta64 dtype or for arrays containing cftime datetime objects

Having searched through similar questions for a possible solution on Stackoverflow, non seems to address this particular challenge. After running the code, I was getting the TypeError: **'.dt' accessor only available for DataArray with datetime64 timedelta64 dtype or for arrays containing cftime datetime objects.. I have tried tweaking the code differently to no avail, any help at this point will be greatly appreciated, please. Thanks
import os
from netCDF4 import Dataset
import xarray as xr
import numpy as np
import pandas as pd
from datetime import datetime
#import ffmpeg
from IPython.display import HTML
from matplotlib import pyplot as plt
from matplotlib import animation
import ipynb
import cartopy.crs as ccrs
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
%run ../functions.ipynb
# Load Metop-A GOME-2 Level 3 AAI data
file = 'ESACCI-AEROSOL-L3-AAI-GOME2A-1D-20210205-fv1.8.nc'
aai_gome2a = xr.open_dataset(file)
# Load Metop Metop-A, -B & -C netcdf data
ds_a = xr.open_mfdataset('ESACCI-AEROSOL-L3-AAI-GOME2A-1D-2021020*.nc',
concat_dim='time',
combine='nested')
ds_b = xr.open_mfdataset('ESACCI-AEROSOL-L3-AAI-GOME2B-1D-2021020*.nc',
concat_dim='time',
combine='nested')
ds_c = xr.open_mfdataset('ESACCI-AEROSOL-L3-AAI-GOME2C-1D-2021020*.nc',
concat_dim='time',
combine='nested')
# display the variable of interest, i.e absorbing_aerosol_index
aai_a=ds_a['absorbing_aerosol_index']
aai_b=ds_b['absorbing_aerosol_index']
aai_c=ds_c['absorbing_aerosol_index']
# Concatenate the data from the three satellites Metop-A, -B and -C
aai_concat = xr.concat([aai_a,aai_b,aai_c], dim='satellite')
# Retrieve time coordinate information and assign time coordinates for the time dimension
start_day = aai_gome2a.description.split()[4]
time_coords = pd.date_range(datetime.strptime(start_day,'%d-%m-%Y'), periods=len(aai_concat.time), freq='d').strftime("%Y-%m-%d").astype('datetime64[ns]')
# Combine AAI data from the three satellites Metop-A, -B and -C onto one single grid
aai_combined = aai_concat.mean(dim='satellite')
# Visualize AAI data with data from the three satellites Metop-A, -B and C combined on one single grid
visualize_pcolormesh(data_array=aai_combined[1,:,:],
longitude=aai_combined.longitude,
latitude=aai_combined.latitude,
projection=ccrs.PlateCarree(),
color_scale='afmhot_r',
unit=' ',
long_name=aai_a.long_name + ' ' + str(aai_combined.time[0].dt.strftime('%Y-%m-%d').data),
vmin=0,
vmax=5,
lonmin=-50,
lonmax=36,
latmin=0,
latmax=70.,
set_global=False)
See expected output here: https://www.canva.com/design/DAE-vuWH6Ak/view

Converting .txt file to .csv using pandas and then change the csv file to 2D numpy array for plotting?

I am trying to convert data that has 2 columns and about 3000 rows to a CSV or excel file. I want to add multiple txt files to the same excel later and plot all of them in the same graph for comparison. But I cannot understand how to change the string type to float. But in the CSV file that pandas save, it is saved as numbers and its possible to make a graph within the CSV without problems.
import pandas as pd
import os
from matplotlib import pyplot as plt
import numpy as np
import csv
os.chdir(r/....)
M1 = pd.read_csv('M1 - 2020_16 jan maybe fbeta.txt', header=0, delimiter=' ', dtype=float)
M1.to_csv('M1.csv')
#print(M1)
M2 = csv.reader(open("M1.csv"))
x = list(M2)
M2 = np.array(x).astype("float")
print(M2)
#plt.plot(M1)
#plt.plot(M2[5,1])
#plt.show()
The error I always get is
ValueError: could not convert string to float:
Can someone please help?

Why does Pandas say this data frame has only one column?

I began a python course in linear and logistic regression but I am encountering what is probably a stupid error. I have to work with this data frame:
http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv
And this is my code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
rwq = pd.read_csv('*filepath*/winequality-red.csv')
rows = len(rwq.index)
cols = rwq.shape[1]
When I print rows and cols, rows correctly prints 1599 but for some reason cols always equals 1 (when in fact they are 12).
I also tried 'len(rwq.columns)' and I still get 1.
Am I doing something wrong or is the problem with the file provided?

Choosing the correct values in excel in Python

General Overview:
I am creating a graph of a large data set, however i have created a sample text document so that it is easier to overcome the problems.
The Data is from an excel document that will be saved as a CSV.
Problem:
I am able to compile the data a it will graph (see below) However how i pull the data will not work for all of the different excel sheet i am going to pull off of.
More Detail of problem:
The Y-Values (Labeled 'Value' and 'Value1') are being pulled for the excel sheet from the numbers 26 and 31 (See picture and Code).
This is a problem because the Values 26 and 31 will not be the same for each graph.
Lets take a look for this to make more sense.
Here is my code
import pandas as pd
import matplotlib.pyplot as plt
pd.read_csv('CSV_GM_NB_Test.csv').T.to_csv('GM_NB_Transpose_Test.csv,header=False)
df = pd.read_csv('GM_NB_Transpose_Test.csv', skiprows = 2)
DID = df['SN']
Value = df['26']
Value1 = df['31']
x= (DID[16:25])
y= (Value[16:25])
y1= (Value1[16:25])
"""
print(x,y)
print(x,y1)
"""
plt.plot(x.astype(int), y.astype(int))
plt.plot(x.astype(int), y1.astype(int))
plt.show()
Output:
Data Set:
Below in the comments you will find the 0bin to my Data Set this is because i do not have enough reputation to post two links.
As you can see from the Data Set
X- DID = Blue
Y-Value = Green
Y-Value1 = Grey
Troublesome Values = Red
The problem again is that the data for the Y-Values are pulled from Row 10&11 from values 26,31 under SN
Let me know if more information is needed.
Thank you
Not sure why you are creating the transposed CSV version. It is also possible to work directly from your original data. For example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('CSV_GM_NB_Test.csv', skiprows=8)
data = df.ix[:,19:].T
data.columns = df['SN']
data.plot()
plt.show()
This would give you:
You can use pandas.DataFrame.ix() to give you a sliced version of your data using integer positions. The [:,19:] says to give you columns 19 onwards. The final .T transposes it. You can then apply the values for the SN column as column headings using .columns to specify the names.

Pandas Dataframe Data Type Conversion or Isomap Transformation

I load images with scipy's misc.imread, which returns in my case 2304x3 ndarray. Later, I append this array to the list and convert it to a DataFrame. The purpose of doing so is to later apply Isomap transform on the DataFrame. My data frame is 84 rows/samples (images in the folder) and 2304 features each feature is array/list of 3 elements. When I try using Isomap transform I get error:
ValueError: setting an array element with a sequence.
I think error is there because elements of my data frame are of the object type. First I tried using a conversion to_numeric on each column, but got an error, then I wrote a loop to convert each element to numeric. The results I get are still of the object type. Here is my code:
import pandas as pd
from scipy import misc
from mpl_toolkits.mplot3d import Axes3D
import matplotlib
import matplotlib.pyplot as plt
import glob
from sklearn import manifold
samples = []
path = 'Datasets/ALOI/32/*.png'
files = glob.glob(path)
for name in files:
img = misc.imread(name)
img = img[::2, ::2]
x = (img/255.0).reshape(-1,3)
samples.append(x)
df = pd.DataFrame.from_records(samples, coerce_float = True)
for i in range(0,2304):
for j in range(0,84):
df[i][j] = pd.to_numeric(df[i][j], errors = 'coerce')
df[i] = pd.to_numeric(df[i], errors = 'coerce')
print df[2303][83]
print df[2303].dtype
print df[2303][83].dtype
#iso = manifold.Isomap(n_neighbors=6, n_components=3)
#iso.fit(df)
#manifold = iso.transform(df)
#print manifold.shape
Last four lines commented out because they give an error. The output I get is:
[ 0.05098039 0.05098039 0.05098039]
object
float64
As you can see each element of DataFrame is of the type float64 but whole column is an object.
Does anyone know how to convert whole data frame to numeric?
Is there another way of applying Isomap?
Do you want to reshape your image to a new shape instead of the original one?
If that is not the case then you should change the following line in your code
x = (img/255.0).reshape(-1,3)
with
x = (img/255.0).reshape(-1)
Hope this will resolve your issue

Categories

Resources