Its dataset "faithful" preloaded by default in any LIBRARY? - python

When I write and run the following code, everything is done fine, but I have a doubt if someone could confirm it for me:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
import pandas as pd
import seaborn as sns
from pydataset import data
sns.set_palette("deep", desat=.6)
sns.set_context(rc={"figure.figsize": (8, 4)})
faithful = data('faithful')
faithful.head(10)
All works fine. But, in the penultimate row above, the dataset 'faithful' I have not loaded, no copied, no have I linked to a URL to access said data. However, it runs and reads all the data. I must assume that this DataSet is included by default, in some library? Which one ?. Where is it located? How can I corroborate or verify this information? Any command? Thanks!.

You are importing the built-in datasets from pydataset module when you are running your 7th line:
from pydataset import data
If you run data() command, you will see all the 750+ datasets contained in this module. 'faithful' data is also present in this.

Related

'DataFrame' object without attribute error

I am trying to run a basic from-scratch code for linear regression. It is giving me this error despite the csv file containing a column header with the following name "studytime" for which it is giving me this error.
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('/Users/brasilgu/Downloads/student/student-por.csv')
plt.scatter(data.studytime, data.score)
plt.show
check your data with
print(data.columns)
to ensure you have no typos etc. if not you'll need to put an example of the data in to reproduce.

Is there a way to download a sample CSV file

I used a sample of a csv program to do some tables on Jupiter notebook, I now need to download that sample csv file so I can look at it in excel, is there a way I can download the sample
I need to download lf if possible.
Here is my code:
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import io
import requests
df = pd.read_csv("diamonds.csv")
lf = df.sample(5000, random_state=999)
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
plt.style.use("seaborn")
lf.sample(5000, random_state=999)'''
You first need to convert the sample to a dataframe and then you can export it.
dframe.to_csv(“file_name.csv”)
Let me know if it works.
Answer from here:
import urllib.request
urllib.request.urlretrieve("http://jupyter.com/diamond.csv", "diamond.csv")
if what you mean by download is exporting the dataframe to spreadsheet format, pandas have the function
import pandas as pd
df = pd.read_csv("diamond.csv")
# do your stuff
df.to_csv("diamond2.csv") # if you want to export to csv with different name
df.to_csv("folder/diamond2.csv") # if you want to export to csv inside existed folder
df.to_excel("diamond2.xlsx") # if you want to export to excel
The file will appear on the same directory as your jupyter notebook.
You can also specify the directory
df.to_csv('D:/folder/diamond.csv')
to check where is your current work directory, you can use
import os
print(os.getcwd())

How do I assign a column in a csv file by python?

I have a CSV that I want to graph.
However, to get this graph, I need to first assign a column to a list (or array) and then go on from there. I need to assign the first column to said list. In the said column, there are many repeats of the numbers 1 through 45 (so in code that would be range(1,46)).
Currently, I have written this so far:
for weekly sales against Date
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
a = []
for stn in range(1,46):
a.append(walmart[walmart.Store == stn])
for printval in range(1,46):
b = a[printval-1]
NOTE: walmart (the value associated to the dataset) has already been read here by pd.read_csv. It works and an output has been made.
I do not know what to do from here. I want to graph this as well based on the store.
The data set can be found: https://www.kaggle.com/divyajeetthakur/walmart-sales-prediction
There are many ways to do this but the easiest that comes to mind is using pandas dataframe
First you need to install it in your environment. I see you tagged anaconda so this would be something like:
$ conda install pandas
Then import them in your python file (presumingly Jupyter notebook)
import pandas as pd
Then you would import the csv into a dataframe using the build in read_csv function (you can do many cool things with it so checkout the docs)
In your case assume you want to import just columns say number 3 and 5 and then plot them. If the first row in your csv contains the header (say 'col3'and 'col5') this should be read automatically and stored as the column name(If you want to skip the header reading add the option skiprows=1, if you want the columns to be named something else use the option names=['newname3', 'newname5']
data = pd.read_csv('path/to/my.csv', usecols=[3,5], names=['col1', 'col2'])
Then you can access the columns by name and plot them using data['colname']:
import matplotlib.pyplot as plt
plt.scatter(data['col1'], data['col2'])
plt.show()
Or you can use the built in function of pandas dataframes:
data.plot.scatter(x='col1', y='col2)
I have found out what I need to do to get this to work. The following code describes my situation.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
a = []
for stn in range(1,46):
a.append(walmart[walmart.Store == stn])
for printval in range(1,46):
b = a[printval-1]
w = b[b.Store == printval]
ws = w["Weekly_Sales"]
tp = w["Date"]
plt.scatter(tp, ws)
plt.xlabel('Date')
plt.ylabel('Weekly Sales')
plt.title('Store_' + str(printval))
plt.savefig('Store_'+ str(printval) + '.png') #To save the file if needed
plt.show()
Again, I have already imported the CSV file, and associated it to walmart. There was no error when doing that.
Again, the dataset can be found in https://www.kaggle.com/divyajeetthakur/walmart-sales-prediction.

How do i import datasets in Python?

I try to import some datasets in my code. I need help, because I tried a lot of tutorials and web pages and I am still gettting errors. I use Spyder IDE and python 3.7:
import numpy as np
import pandas as pd
import tensorflow as tf
import os
dts1=pd.read_csv(r"C:\Users\Cucu\Desktop\sample_submission.csv")
dts1
This works for me. If you are still experiencing errors, please post them.
import pandas as pd
# Read data from file 'sample_submission.csv'
# (in the same directory that your python process is based)
# Control delimiters, rows, column names with read_csv (see later)
data = pd.read_csv(r"C:\Users\Cucu\Desktop\sample_submission.csv")
# Preview the first 5 lines of the loaded data
print(data.head())
Try using other approaches :
pd.read_csv("C:\\Users\\Cucu\\Desktop\\sample_submission.csv")
pd.read_csv("C:/Users/Cucu/Desktop/sample_submission.csv")

Set matplotlib backend from Pandas

I am currently facing the following issue. I have a couple of Python scripts that plot some useful information using the Python module Pandas which uses Matplotlib .
As far as I understand matplotlib let set its backend as described on the accepted answer to this question.
I would like to set the matplotlib backend from Pandas:
Is it possible?
How can I do it?
EDIT 1:
By the way my code looks like:
import pandas as pd
from pandas import DataFrame, Series
class MyPlotter():
def plot_from_file(self, stats_file_name, f_name_out, names,
title='TITLE', x_label='x label', y_label='y label'):
df = pd.read_table(stats_file_name, index_col=0, parse_dates=True,
names= names)
plot = df.plot(lw=2,colormap='jet',marker='.',markersize=10,title=title,figsize=(20, 15))
plot.set_xlabel(x_label)
plot.set_ylabel(y_label)
fig = plot.get_figure()
fig.savefig(f_name_out)
plot.cla()
I've just applied the solution posted on the this question and it worked out.
In others words, my code imports looked as:
import pandas as pd
from pandas import DataFrame, Series
After applying the solution the imports look as:
import pandas as pd
from pandas import DataFrame, Series
import matplotlib
matplotlib.use('pdf')
import matplotlib.pyplot as plt
I know I am answering my own question, but I am doing so in case someone can find it useful.

Categories

Resources