CSV read error in Python using Panda - python

I want to read a csv file
import pandas as pd
import numpy as np
import matplotlib as plt
from pandas import DataFrame
df = pd.read_csv(r'C:\Andy\DataScience\python\Loan_Prediction\Train.csv')
df.head(10)
But getting error as below
IOError: File Train.csv does not exist
But the file does exist in the location.

If using backslash, because it is a special character in Python, you must remember to escape every instance
import pandas as pd
import numpy as np
import matplotlib as plt
from pandas import DataFrame
df = pd.read_csv(r'C:\\Andy\\DataScience\\python\\Loan_Prediction\\Train.csv')
df.head(10)

your read_csv could not find the path for reading the csv you have to give forward slashes
import pandas as pd
import numpy as np
import matplotlib as plt
from pandas import DataFrame
df = pd.read_csv('C:/Andy/DataScience/python/Loan_Prediction/Train.csv')
if it again gives error then just double the slashes to avoid any special character.
df = pd.read_csv('C://Andy//DataScience//python//Loan_Prediction//Train.csv')
df.head(10)

Related

Is there a way to download a sample CSV file

I used a sample of a csv program to do some tables on Jupiter notebook, I now need to download that sample csv file so I can look at it in excel, is there a way I can download the sample
I need to download lf if possible.
Here is my code:
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import io
import requests
df = pd.read_csv("diamonds.csv")
lf = df.sample(5000, random_state=999)
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
plt.style.use("seaborn")
lf.sample(5000, random_state=999)'''
You first need to convert the sample to a dataframe and then you can export it.
dframe.to_csv(“file_name.csv”)
Let me know if it works.
Answer from here:
import urllib.request
urllib.request.urlretrieve("http://jupyter.com/diamond.csv", "diamond.csv")
if what you mean by download is exporting the dataframe to spreadsheet format, pandas have the function
import pandas as pd
df = pd.read_csv("diamond.csv")
# do your stuff
df.to_csv("diamond2.csv") # if you want to export to csv with different name
df.to_csv("folder/diamond2.csv") # if you want to export to csv inside existed folder
df.to_excel("diamond2.xlsx") # if you want to export to excel
The file will appear on the same directory as your jupyter notebook.
You can also specify the directory
df.to_csv('D:/folder/diamond.csv')
to check where is your current work directory, you can use
import os
print(os.getcwd())

I am not able to view nba_api data frame

Not sure how to resolve this error
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import nba_api
from nba_api.stats.endpoints import leagueleaders
stats = leagueleaders.LeagueLeaders(season='2017-18')
df17 = stats.get_data_frames()
df17.head()
'list' object has no attribute 'head'
Based on the error, it looks like the API you're using is giving you the data in the form of a standard list object, not as a pandas data frame. You'll need to wrap the data inside the list obtained from the API inside a pandas data frame.

Issue with reading a text file in pandas

I am trying to import a txt file which has around 56 columns and has different data types.
Few columns have values with prefix 000, which I cannot see once the data has been imported.
I am also getting the error message "specify dtype option on reading or set low_memory=false".
Values in certain columns have changed to "NaN" & "4.40578e+01", which is not correct...
I want the data to be imported and displayed correctly.
This is code that I am using
from os import os path
import numpy as np
import pandas as pd
df=pd.read_csv(r"C:\Users\abc\desktop\file.txt",sep=",")
df.head()

How to convert the arff object loaded from a .arff file into a dataframe format?

I was able to load the .arff file using the following commands. But I was not able to extract the data from the object and convert the object into a dataframe format. I need this to do apply machine learning algorithms on this dataframe.
Command:-
import arff
dataset = pd.DataFrame(arff.load(open('Training Dataset.arff')))
print(dataset)
Please help me to convert the data from here into a dataframe.
import numpy as np
import pandas as pd
from scipy.io.arff import loadarff
raw_data = loadarff('Training Dataset.arff')
df_data = pd.DataFrame(raw_data[0])
Try this. Hope it helps
from scipy.io.arff import loadarff
import pandas as pd
data = loadarff('Training Dataset.arff')
df = pd.DataFrame(data[0])
Similar to answer above, but no need to import numpy

Only half of my CSV is encoded

I'm importing a cvs file into pandas and when I do the first few names are encoded correctly then further down the accents turn back into symbols. It's a pretty large file with almost 200 names. Is there anything I can do to fix this issue.
import sys
import codecs
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
#%matplotlib inline
from matplotlib.pylab import rcParams
sys.stdout = codecs.getwriter( "ISO-8859-1" )( sys.stdout.detach() )
rcParams['figure.figsize'] = 15, 6
data = pd.read_csv('IndNames.csv', encoding='ISO-8859-1')
pd.get_option("display.max_rows")
pd.set_option('expand_frame_repr', False)
pd.set_option('display.height', 500)
data.align(data, axis=1)
print(data.head(n=182))
Ex: José
José
Edit: ftfy does not work with dataframes
Edit1: I can't figure out the problem when I save it to a csv file everything is normal then when I use pd.read_csv to use it again it's unencoded.
sys.stdout = codecs.getwriter( "UTF-8" )( sys.stdout.detach() )
Simple fix and I don't know why it didn't work before when I tried it but this did the trick

Categories

Resources