How can I read csv form kaggle - python

I want to read a csv-File from kaggle:
import os
import pandas as pd
df = pd.read_csv('/kaggle/input/ibm-hr-analytics-attrition-dataset/WA_Fn-UseC_-HR-Employee-Attrition.csv')
print("Shape of dataframe is: {}".format(df.shape))
But I get this error:
FileNotFoundError: [Errno 2] No such file or directory: '/kaggle/input/ibm-hr-analytics-attrition-dataset/WA_Fn-UseC_-HR-Employee-Attrition.csv'
I took the file path from kaggle.
Thank you for any help.

You have to adapt that path to the downloaded file.
df = pd.read_csv('/kaggle/input/ibm-hr-analytics-attrition-dataset/WA_Fn-UseC_-HR-Employee-Attrition.csv')
Is only an example path. Everyone has to change this path to the location where the downloaded .csv file from their homepage got saved.
The .csv file for download is available here:
https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset

Related

Cocalc for Python, can't find csv files to read into pandas dataframe

I am trying to read a file into a pandas dataframe and I get a file not found error.
Find the path
import os
print('Get current working directory : ', os.getcwd())
print('Get current file name : ', "iris.csv")
I get this:
Get current working directory : /home/user/test
Get current file name : iris.csv
try to load:
I have tried the following
iris = pd.read_csv("test/iris.csv")
iris = pd.read_csv("home/user/test/iris.csv")
iris = pd.read_csv("iris.csv")
Any thoughts? Thanks.

OSError: [Errno 36] File name too long: for python package and .txt file, pandas opening

Error OSError: [Errno 36] File name too long: for the following code:
from importlib_resources import open_text
import pandas as pd
with open_text('package.data', 'librebook.txt') as f:
input_file = f.read()
dataset = pd.read_csv(input_file)
Ubuntu 20.04 OS, this is for a python package, init.py file
I dont want to use .readlines()
Can I structure this code differently to not have this outcome occur? Do I need to modify my OS system? Some of the help I found looked to modify OS but dont want to do this if I dont need to. Thank you.
why not just pass in the name of the file and not the contents
dataset = pd.read_csv('librebook.txt')
from importlib_resources import path
import pandas as pd
with path('package.data', 'librebook.txt') as f:
dataset = pd.read_csv(f)

Reading in txt file as pandas dataframe from a folder within a zipped folder

I want to read in a txt file that sits in a folder within a zipped folder as a pandas data frame.
I've looked at how to read in a txt file and how to access a file from within a zipped folder, Load data from txt with pandas and Download Returned Zip file from URL respectively.
The problem is I get a KeyError message with my code.
I think it's because my txt file sits in a folder within a folder?
Thanks for any help!
# MWE
import requests
import pandas as pd
from zipfile import ZipFile
from io import BytesIO
txt_raw = 'hcc-data.txt'
zip_raw = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00423/hcc-survival.zip'
r = requests.get(zip_raw)
files = ZipFile(BytesIO(r.content))
df_raw = pd.read_csv(files.open(txt_raw), sep=",", header=None)
# ERROR
KeyError: "There is no item named 'hcc-data.txt' in the archive"
You need to add full path to the file:
txt_raw = 'hcc-survival/hcc-data.txt'

Python iterating over excel files in a folder

I am interested in getting this script to open an excel file, and save it again as a .csv or .txt file. I'm pretty sure the problem with this is the iteration - I haven't coded it correctly to iterate properly over the contents of the folder. I am new to Python, and I managed to get this code to sucessfully print a copy of the contents of the items in the folder by the commented out part. Can someone please advise what needs to be fixed?
My error is: raise XLRDError('Unsupported format, or corrupt file: ' + msg)
from xlrd import open_workbook
import csv
import glob
import os
import openpyxl
cwd= os.getcwd()
print (cwd)
FileList = glob.glob('*.xlsx')
#print(FileList)
for i in FileList:
rb = open_workbook(i)
wb = copy(rb)
wb.save('new_document.csv')
I would just use:
import pandas as pd
import glob
import os
file_list = glob.glob('*.xlsx')
for file in file_list:
filename = os.path.split(file, )[1]
pd.read_excel(file).to_csv(filename.replace('xlsx', 'csv'), index=False)
It appears that your error is related to the excel files, not because of your code.
Check that your files aren't also open in Excel at the same time.
Check that your files aren't encrypted.
Check that your version of xlrd supports the files you are reading
In the above order. Any of the above could have caused your error.

Unable to import JSON file into a Python file

I am working on a dictionary project. So, I downloaded a JSON file and placed it on my desktop. I tried to import it into my Python file but it says the file is not found.
FileNotFoundError: [Errno 2] No such file or directory: 'data.json'
Here's my code:
import json
data = json.loads(open('data.json'))
print(data)
You can use os.path.expanduser to get the home directory of the current user and then using os.path.join you can obtain the full path to data.json located in Desktop directory.
Use:
import os
import json
filepath = os.path.join(os.path.expanduser("~"), "Desktop", "data.json")
with open(filepath) as file:
data = json.load(file)
print(data)

Categories

Resources