Read a .csv into pandas from F: drive on Windows 7 - python

I have a .csv file on my F: drive on Windows 7 64-bit that I'd like to read into pandas and manipulate.
None of the examples I see read from anything other than a simple file name (e.g. 'foo.csv').
When I try this I get error messages that aren't making the problem clear to me:
import pandas as pd
trainFile = "F:/Projects/Python/coursera/intro-to-data-science/kaggle/data/train.csv"
trainData = pd.read_csv(trainFile)
The error message says:
IOError: Initializing from file failed
I'm missing something simple here. Can anyone see it?
Update:
I did get more information like this:
import csv
if __name__ == '__main__':
trainPath = 'F:/Projects/Python/coursera/intro-to-data-science/kaggle/data/train.csv'
trainData = []
with open(trainPath, 'r') as trainCsv:
trainReader = csv.reader(trainCsv, delimiter=',', quotechar='"')
for row in trainReader:
trainData.append(row)
print trainData
I got a permission error on read. When I checked the properties of the file, I saw that it was read-only. I was able to read 892 lines successfully after unchecking it.
Now pandas is working as well. No need to move the file or amend the path. Thanks for looking.

I cannot promise that this will work, but it's worth a shot:
import pandas as pd
import os
trainFile = "F:/Projects/Python/coursera/intro-to-data-science/kaggle/data/train.csv"
pwd = os.getcwd()
os.chdir(os.path.dirname(trainFile))
trainData = pd.read_csv(os.path.basename(trainFile))
os.chdir(pwd)

A better solution is to use literal strings like r'pathname\filename' rather than 'pathname\filename'. See Lexical Analysis for more details.

I also got the same issue and got that resolved .
Check your path for the file correctly
I initially had the path like
dfTrain = pd.read_csv("D:\\Kaggle\\labeledTrainData.tsv",header=0,delimiter="\t",quoting=3)
This returned an error because the path was wrong .Then I have changed the path as below.This is working fine.
dfTrain = dfTrain = pd.read_csv("D:\\Kaggle\\labeledTrainData.tsv\\labeledTrainData.tsv",header=0,delimiter="\t",quoting=3)
This is because my earlier path was not correct.Hope you get it reolved

This happens to me quite often. Usually I open the csv file in Excel, and save it as an xlsx file, and it works.
so instead of
df = pd.read_csv(r"...\file.csv")
Use:
df = pd.read_excel(r"...\file.xlsx")

If you're sure the path is correct, make sure no other programs have the file open. I got that error once, and closing the Excel file made the error go away.

Try this:
import os
import pandas as pd
trainFile = os.path.join('F:',os.sep,'Projects','Python','coursera','intro-to-data-science','train.csv' )
trainData = pd.read_csv(trainFile)

Related

How to write pandas dataframe into Databricks dbfs/FileStore?

I'm new to the Databricks, need help in writing a pandas dataframe into databricks local file system.
I did search in google but could not find any case similar to this, also tried the help guid provided by databricks (attached) but that did not work either. Attempted the below changes to find my luck, the commands goes just fine, but the file is not getting written in the directory (expected wrtdftodbfs.txt file gets created)
df.to_csv("/dbfs/FileStore/NJ/wrtdftodbfs.txt")
Result: throws the below error
FileNotFoundError: [Errno 2] No such file or directory:
'/dbfs/FileStore/NJ/wrtdftodbfs.txt'
df.to_csv("\\dbfs\\FileStore\\NJ\\wrtdftodbfs.txt")
Result: No errors, but nothing written either
df.to_csv("dbfs\\FileStore\\NJ\\wrtdftodbfs.txt")
Result: No errors, but nothing written either
df.to_csv(path ="\\dbfs\\FileStore\\NJ\\",file="wrtdftodbfs.txt")
Result: TypeError: to_csv() got an unexpected keyword argument 'path'
df.to_csv("dbfs:\\FileStore\\NJ\\wrtdftodbfs.txt")
Result: No errors, but nothing written either
df.to_csv("dbfs:\\dbfs\\FileStore\\NJ\\wrtdftodbfs.txt")
Result: No errors, but nothing written either
The directory exists and the files created manually shows up but pandas to_csv never writes nor error out.
dbutils.fs.put("/dbfs/FileStore/NJ/tst.txt","Testing file creation and existence")
dbutils.fs.ls("dbfs/FileStore/NJ")
Out[186]: [FileInfo(path='dbfs:/dbfs/FileStore/NJ/tst.txt',
name='tst.txt', size=35)]
Appreciate your time and pardon me if the enclosed details are not clear enough.
Try with this in your notebook databricks:
import pandas as pd
from io import StringIO
data = """
CODE,L,PS
5d8A,N,P60490
5d8b,H,P80377
5d8C,O,P60491
"""
df = pd.read_csv(StringIO(data), sep=',')
#print(df)
df.to_csv('/dbfs/FileStore/NJ/file1.txt')
pandas_df = pd.read_csv("/dbfs/FileStore/NJ/file1.txt", header='infer')
print(pandas_df)
This worked out for me:
outname = 'pre-processed.csv'
outdir = '/dbfs/FileStore/'
dfPandas.to_csv(outdir+outname, index=False, encoding="utf-8")
To download the file, add files/filename to your notebook url (before the interrogation mark ?):
https://community.cloud.databricks.com/files/pre-processed.csv?o=189989883924552#
(you need to edit your home url, for me is :
https://community.cloud.databricks.com/?o=189989883924552#)
dbfs file explorer

Unable to read a parquet file

I am breaking my head over this right now. I am new to this parquet files, and I am running into a LOT of issues with it.
I am thrown an error that reads OSError: Passed non-file path: \datasets\proj\train\train.parquet each time I try to create a df from it.
I've tried this:
pq.read_pandas(r'E:\datasets\proj\train\train.parquet').to_pandas()
AND
od = pd.read_parquet(r'E:\datasets\proj\train\train.parquet', engine='pyarrow')
I also changed the drive letter of the drive the dataset resides, and it's the SAME THING!
It's the same with all engines.
PLEASE HELP!
This might be a problem with Arrow's file path handling. You could instead pass in an already opened file:
import pandas as pd
with open(r'E:\datasets\proj\train\train.parquet', 'rb') as f:
df = pd.read_parquet(f, engine='pyarrow')
Try using fastparquet as engine, worked for me.
engine = "fastparquet"

pandas.read_csv cant find my path error

So I tried to run this code.
import pandas
i = input("hi input a csv file..")
df = pandas.read_csv(i)
and I got an error saying
FileNotFoundError: File b'"C:\\Users\\thomas.swenson\\Downloads\\hi.csv"' does not exist
but then if I hard code that path that 'doesn't exist' into my program it works fine.
import pandas
df = pandas.read_csv("C:\\Users\\thomas.swenson\\Downloads\\hi.csv")
it works just fine.
Anyone know why this may be happening?
I'm running python 3.6 and using a virtualenv
looks like the input function was placing another set of quotes around the input.
so ill just have to remove them and it works fine.

How to read HDF5 files that have only datasets (no groups) using h5py?

I have HDF5 files that I would like to open using the Python module h5py (in Python 2.7).
This is easy when I have a file with groups and datasets:
import h5py as hdf
with hdf.File(relative_path_to_file, 'r') as f:
my_data = f['a_group']['a_dataset'].value
However, in my current situation I do not have groups. There are only datasets. Unfortunately, I cannot access my data no matter what I try. None of the following work (all break with KeyErrors or ValueErrors):
my_data = f['a_dataset'].value #KeyError
my_data = f['/a_dataset'].value #KeyError
my_data = f['/']['a_dataset'].value #KeyError
my_data = f['']['a_dataset'].value #ValueError
my_data = f['.']['a_dataset'].value #KeyError
I can remake my files to have a group if there is no solution. It really seems like there should be a solution, though...
It seems like h5py is not seeing any keys:
f.keys()
[]
I found the issue, which I think is an issue h5py should address.
The issue (which I originally forgot to detail in the question, now edited) is that I open the hdf5 file with a relative file path. When I use and absolute file path, everything works perfectly.
Sadly, this is going to cause me problems down the road as my code is intended to run portably on different machines...
Thanks to gspr and jimmyb for their help :-)
It worked fine when I was using a relative path.
To write:
fileName = "data/hdf5/topo.hdf5"
with h5py.File(fileName, 'w') as f:
dset = f.create_dataset('topography', data = z, dtype = 'float32')
To read data:
with h5py.File(fileName, 'r') as f:
my_data = f['.']['topography'].value
I think that this should work:
f['.']['a_dataset']
And you might try to do:
dir(f['/'])
dir(f['.'])

File path name for NumPy's loadtxt()

I was wondering if somebody had some information on how to load a CSV file using NumPy's loadtxt(). For some reason it claims that there is no such file or directory, when clearly there is. I've even copy/pasted the full path (with and without the leading / for root), but to no avail.
from numpy import *
FH = loadtxt("/Users/groenera/Desktop/file.csv")
or
from numpy import *
FH = loadtxt("Users/groenera/Desktop/file.csv")
The documentation for loadtxt is very unhelpful about this.
You might have forgot the double slash, "//". Some machines require this.
So instead of
FH = loadtxt("/Users/groenera/Desktop/file.csv")
do this:
FH = loadtxt("C:\\Users\\groenera\\Desktop\\file.csv")
This is probably not a loadtxt problem. Try simply
f = open("/Users/groenera/Desktop/file.csv")
to make sure it is loadtxt's fault. Also, try using a Unicode string:
f = open(u"/Users/groenera/Desktop/file.csv")
I am using PyCharm, Python 3.5.2.
Right-click on your project and open a file with 'planet.csv' and paste your text.
Add a header to each column.
Code:
import pandas as pd
data = pd.read_csv('planet.csv',sep = "\n")
print (data)

Categories

Resources