Is there a way to download a sample CSV file - python

I used a sample of a csv program to do some tables on Jupiter notebook, I now need to download that sample csv file so I can look at it in excel, is there a way I can download the sample
I need to download lf if possible.
Here is my code:
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import io
import requests
df = pd.read_csv("diamonds.csv")
lf = df.sample(5000, random_state=999)
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
plt.style.use("seaborn")
lf.sample(5000, random_state=999)'''

You first need to convert the sample to a dataframe and then you can export it.
dframe.to_csv(“file_name.csv”)
Let me know if it works.

Answer from here:
import urllib.request
urllib.request.urlretrieve("http://jupyter.com/diamond.csv", "diamond.csv")

if what you mean by download is exporting the dataframe to spreadsheet format, pandas have the function
import pandas as pd
df = pd.read_csv("diamond.csv")
# do your stuff
df.to_csv("diamond2.csv") # if you want to export to csv with different name
df.to_csv("folder/diamond2.csv") # if you want to export to csv inside existed folder
df.to_excel("diamond2.xlsx") # if you want to export to excel
The file will appear on the same directory as your jupyter notebook.
You can also specify the directory
df.to_csv('D:/folder/diamond.csv')
to check where is your current work directory, you can use
import os
print(os.getcwd())

Related

How do I assign a column in a csv file by python?

I have a CSV that I want to graph.
However, to get this graph, I need to first assign a column to a list (or array) and then go on from there. I need to assign the first column to said list. In the said column, there are many repeats of the numbers 1 through 45 (so in code that would be range(1,46)).
Currently, I have written this so far:
for weekly sales against Date
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
a = []
for stn in range(1,46):
a.append(walmart[walmart.Store == stn])
for printval in range(1,46):
b = a[printval-1]
NOTE: walmart (the value associated to the dataset) has already been read here by pd.read_csv. It works and an output has been made.
I do not know what to do from here. I want to graph this as well based on the store.
The data set can be found: https://www.kaggle.com/divyajeetthakur/walmart-sales-prediction
There are many ways to do this but the easiest that comes to mind is using pandas dataframe
First you need to install it in your environment. I see you tagged anaconda so this would be something like:
$ conda install pandas
Then import them in your python file (presumingly Jupyter notebook)
import pandas as pd
Then you would import the csv into a dataframe using the build in read_csv function (you can do many cool things with it so checkout the docs)
In your case assume you want to import just columns say number 3 and 5 and then plot them. If the first row in your csv contains the header (say 'col3'and 'col5') this should be read automatically and stored as the column name(If you want to skip the header reading add the option skiprows=1, if you want the columns to be named something else use the option names=['newname3', 'newname5']
data = pd.read_csv('path/to/my.csv', usecols=[3,5], names=['col1', 'col2'])
Then you can access the columns by name and plot them using data['colname']:
import matplotlib.pyplot as plt
plt.scatter(data['col1'], data['col2'])
plt.show()
Or you can use the built in function of pandas dataframes:
data.plot.scatter(x='col1', y='col2)
I have found out what I need to do to get this to work. The following code describes my situation.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
a = []
for stn in range(1,46):
a.append(walmart[walmart.Store == stn])
for printval in range(1,46):
b = a[printval-1]
w = b[b.Store == printval]
ws = w["Weekly_Sales"]
tp = w["Date"]
plt.scatter(tp, ws)
plt.xlabel('Date')
plt.ylabel('Weekly Sales')
plt.title('Store_' + str(printval))
plt.savefig('Store_'+ str(printval) + '.png') #To save the file if needed
plt.show()
Again, I have already imported the CSV file, and associated it to walmart. There was no error when doing that.
Again, the dataset can be found in https://www.kaggle.com/divyajeetthakur/walmart-sales-prediction.

Its dataset "faithful" preloaded by default in any LIBRARY?

When I write and run the following code, everything is done fine, but I have a doubt if someone could confirm it for me:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
import pandas as pd
import seaborn as sns
from pydataset import data
sns.set_palette("deep", desat=.6)
sns.set_context(rc={"figure.figsize": (8, 4)})
faithful = data('faithful')
faithful.head(10)
All works fine. But, in the penultimate row above, the dataset 'faithful' I have not loaded, no copied, no have I linked to a URL to access said data. However, it runs and reads all the data. I must assume that this DataSet is included by default, in some library? Which one ?. Where is it located? How can I corroborate or verify this information? Any command? Thanks!.
You are importing the built-in datasets from pydataset module when you are running your 7th line:
from pydataset import data
If you run data() command, you will see all the 750+ datasets contained in this module. 'faithful' data is also present in this.

How do i import datasets in Python?

I try to import some datasets in my code. I need help, because I tried a lot of tutorials and web pages and I am still gettting errors. I use Spyder IDE and python 3.7:
import numpy as np
import pandas as pd
import tensorflow as tf
import os
dts1=pd.read_csv(r"C:\Users\Cucu\Desktop\sample_submission.csv")
dts1
This works for me. If you are still experiencing errors, please post them.
import pandas as pd
# Read data from file 'sample_submission.csv'
# (in the same directory that your python process is based)
# Control delimiters, rows, column names with read_csv (see later)
data = pd.read_csv(r"C:\Users\Cucu\Desktop\sample_submission.csv")
# Preview the first 5 lines of the loaded data
print(data.head())
Try using other approaches :
pd.read_csv("C:\\Users\\Cucu\\Desktop\\sample_submission.csv")
pd.read_csv("C:/Users/Cucu/Desktop/sample_submission.csv")

How to extract the name of the file uploaded on a jupyter file using python?

My first question here.
I have been working with python on jupyter notebook for a personal project. I am using a code to dynamically allow users to select a csv file on which they wish to test my code on. However, I am not sure how to extract the name of this file once I have uploaded this file. The code goes on as follows:
***import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import io
from google.colab import files
from scipy import stats
uploaded = files.upload()
df = pd.read_csv(io.BytesIO(uploaded['TestData.csv']))
df.head()
.
.
.***
As you can see, after the upload when I try to read the file, I have to type its name manually in the code. Is there a way to automatically capture the name of the file in a variable and then I can use the same while calling the pandas read function?

Issue with reading a text file in pandas

I am trying to import a txt file which has around 56 columns and has different data types.
Few columns have values with prefix 000, which I cannot see once the data has been imported.
I am also getting the error message "specify dtype option on reading or set low_memory=false".
Values in certain columns have changed to "NaN" & "4.40578e+01", which is not correct...
I want the data to be imported and displayed correctly.
This is code that I am using
from os import os path
import numpy as np
import pandas as pd
df=pd.read_csv(r"C:\Users\abc\desktop\file.txt",sep=",")
df.head()

Categories

Resources