I am trying to run a basic from-scratch code for linear regression. It is giving me this error despite the csv file containing a column header with the following name "studytime" for which it is giving me this error.
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('/Users/brasilgu/Downloads/student/student-por.csv')
plt.scatter(data.studytime, data.score)
plt.show
check your data with
print(data.columns)
to ensure you have no typos etc. if not you'll need to put an example of the data in to reproduce.
Related
i have the following quastion-
What can you tell about the relationship between time and speed? Is there a best time of day to connect? Has it changed throughout the years?
this is my dataframedataframe
my columns
data
does any one have any suggestion on how i would aprouch this question ?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.read_csv('/Users/dimagoroh/Desktop/data_vis/big_file.csv', low_memory=False)
sns.lmplot(x="hours",y="speed",data=df)
im trying to do a plot but get this error i think i need to manipulate the hour column to a diffrent data type right now it is set as object
Please post the error you get. From the data I think you need to pass x="hour" and not x="hours". Also try
df.hour = pd.to_datetime(df.hour)
When I try this code;
import pandas as pd
breastcancer=pd.read_excel("breastCancer.xlsx")
print(breastcancer.Class)
I get an attribute error.
If I use this;
import pandas as pd
breastcancer=pd.read_excel("breastCancer.xlsx")
print(breastcancer['Class'])
I get a Key Error
It's because there is no column called Class in your spreadsheet. You need to spell it exactly, using the same case (e.g. if it's CLASS in the spreadsheet you can't access it using breastcancer['class']).
When I write and run the following code, everything is done fine, but I have a doubt if someone could confirm it for me:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
import pandas as pd
import seaborn as sns
from pydataset import data
sns.set_palette("deep", desat=.6)
sns.set_context(rc={"figure.figsize": (8, 4)})
faithful = data('faithful')
faithful.head(10)
All works fine. But, in the penultimate row above, the dataset 'faithful' I have not loaded, no copied, no have I linked to a URL to access said data. However, it runs and reads all the data. I must assume that this DataSet is included by default, in some library? Which one ?. Where is it located? How can I corroborate or verify this information? Any command? Thanks!.
You are importing the built-in datasets from pydataset module when you are running your 7th line:
from pydataset import data
If you run data() command, you will see all the 750+ datasets contained in this module. 'faithful' data is also present in this.
I wanna find the median of a dataset using np.median . But for unexpected reasons, the numpy results differ from each other. If I'm converting the dataframe into a list and than use np.median(li) I've got 1.0791015625 as a result. However if I'm using np.median(df['diesel'])I've got 1.079 as a result. Interestingly using statistics.median() works for both versions (using a list or a dataframe). Does anyone know what I did wrong or what could caused this problem?
import pandas as pd
import numpy as np
import statistics
import math
df = pd.read_csv("2020-08-09-prices.csv",sep=',', usecols=['diesel'], dtype={'diesel': np.float16})
df.info()
li=df['diesel'].tolist()
print(df.describe())
print(np.median(li))
print(statistics.median(df['diesel']))
print(np.median(df['diesel']))
This is where I got the csv file from: https://dev.azure.com/tankerkoenig/_git/tankerkoenig-data?path=%2Fprices%2F2020%2F08
When I run this code (in python 3):
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize']=(20.0,10.0)
url="https://github.com/RupeshMohan/Linear_Regression/blob/master/headbrain.csv"
dataset=pd.read_csv(url,names=names)
print(dataset.shape)
dataset.head()
I get:
NameError: name 'names' is not defined
The name error occurs if Cpython interpreter does not recognises an object name which have been used in python source code. In your given code you have used at 7th line while using read_csv function
dataset=pd.read_csv(url,names=names)
you have used names=names but in your code you haven't created 'names' array. names attribute in read_csv() is the list of column names to use. so you need to create a list first and then assign to the attribute names.