import openpyxl
input_workbook1 = openpyxl.load_workbook('/dbfs/FileStore/Test/my_excel.xlsx')
sheet_1 = input_workbook1.active
sheet_1['A2'] = 'A2'
input_workbook1.save('/dbfs/FileStore/Test/Output.xlsx')
OSError: [Errno 95] Operation not supported
I tried reading the excel file directly using openpyxl in databricks , I can able to read and modify directly without pandas/dataframes, but when I am trying to save i.e last line in above code facing the issue.I tried exactly the same way but facing the above error , can anyone help me please
I tried doing the same procedure and it gave me the same error OSError: [Errno 95] Operation not supported. The reason for this is that there is a limitation that random writes do not work on the local file system and here is the official Microsoft documentation (Local File API limitations) which refers to this issue.
So, try instead of trying to write to the local file system, write the file to /databricks/driver/ path and then copy/move the file to required directory.
Modify your code as following:
import openpyxl
input_workbook1 = openpyxl.load_workbook('/dbfs/FileStore/Test/my_excel.xlsx')
sheet_1 = input_workbook1.active
sheet_1['A2'] = 'A2'
input_workbook1.save('Output.xlsx')
#will be saved to '/databricks/driver/'.
#Use dbutils.fs.ls('/databricks/driver/') to view.
from shutil import move
move('/databricks/driver/Output.xlsx','/dbfs/FileStore/Test/')
wb1 = openpyxl.load_workbook('/dbfs/FileStore/Output.xlsx')
ws1 = wb1.active
for row in ws1.iter_rows():
print([col.value for col in row])
The above code will successfully move your file to the required path without any errors.
I am breaking my head over this right now. I am new to this parquet files, and I am running into a LOT of issues with it.
I am thrown an error that reads OSError: Passed non-file path: \datasets\proj\train\train.parquet each time I try to create a df from it.
I've tried this:
pq.read_pandas(r'E:\datasets\proj\train\train.parquet').to_pandas()
AND
od = pd.read_parquet(r'E:\datasets\proj\train\train.parquet', engine='pyarrow')
I also changed the drive letter of the drive the dataset resides, and it's the SAME THING!
It's the same with all engines.
PLEASE HELP!
This might be a problem with Arrow's file path handling. You could instead pass in an already opened file:
import pandas as pd
with open(r'E:\datasets\proj\train\train.parquet', 'rb') as f:
df = pd.read_parquet(f, engine='pyarrow')
Try using fastparquet as engine, worked for me.
engine = "fastparquet"
So I tried to run this code.
import pandas
i = input("hi input a csv file..")
df = pandas.read_csv(i)
and I got an error saying
FileNotFoundError: File b'"C:\\Users\\thomas.swenson\\Downloads\\hi.csv"' does not exist
but then if I hard code that path that 'doesn't exist' into my program it works fine.
import pandas
df = pandas.read_csv("C:\\Users\\thomas.swenson\\Downloads\\hi.csv")
it works just fine.
Anyone know why this may be happening?
I'm running python 3.6 and using a virtualenv
looks like the input function was placing another set of quotes around the input.
so ill just have to remove them and it works fine.
I'm trying to read a file using python and I keep getting this error
ERROR: Line magic function `%user_vars` not found.
My code is very basic just
names = read_csv('Combined data.csv')
names.head()
I get this for anytime I try to read or open a file. I tried using this thread for help.
ERROR: Line magic function `%matplotlib` not found
I'm using enthought canopy and I have IPython version 2.4.1. I made sure to update using the IPython installation page for help. I'm not sure what's wrong because it should be very simple to open/read files. I even get this error for opening text files.
EDIT:
I imported traceback and used
print(traceback.format_exc())
But all I get is none printed. I'm not sure what that means.
Looks like you are using Pandas. Try the following (assuming your csv file is in the same path as the your script lib) and insert it one line at a time if you are using the IPython Shell:
import pandas as pd
names = pd.read_csv('Combined data.csv')
names.head()
I have a .csv file on my F: drive on Windows 7 64-bit that I'd like to read into pandas and manipulate.
None of the examples I see read from anything other than a simple file name (e.g. 'foo.csv').
When I try this I get error messages that aren't making the problem clear to me:
import pandas as pd
trainFile = "F:/Projects/Python/coursera/intro-to-data-science/kaggle/data/train.csv"
trainData = pd.read_csv(trainFile)
The error message says:
IOError: Initializing from file failed
I'm missing something simple here. Can anyone see it?
Update:
I did get more information like this:
import csv
if __name__ == '__main__':
trainPath = 'F:/Projects/Python/coursera/intro-to-data-science/kaggle/data/train.csv'
trainData = []
with open(trainPath, 'r') as trainCsv:
trainReader = csv.reader(trainCsv, delimiter=',', quotechar='"')
for row in trainReader:
trainData.append(row)
print trainData
I got a permission error on read. When I checked the properties of the file, I saw that it was read-only. I was able to read 892 lines successfully after unchecking it.
Now pandas is working as well. No need to move the file or amend the path. Thanks for looking.
I cannot promise that this will work, but it's worth a shot:
import pandas as pd
import os
trainFile = "F:/Projects/Python/coursera/intro-to-data-science/kaggle/data/train.csv"
pwd = os.getcwd()
os.chdir(os.path.dirname(trainFile))
trainData = pd.read_csv(os.path.basename(trainFile))
os.chdir(pwd)
A better solution is to use literal strings like r'pathname\filename' rather than 'pathname\filename'. See Lexical Analysis for more details.
I also got the same issue and got that resolved .
Check your path for the file correctly
I initially had the path like
dfTrain = pd.read_csv("D:\\Kaggle\\labeledTrainData.tsv",header=0,delimiter="\t",quoting=3)
This returned an error because the path was wrong .Then I have changed the path as below.This is working fine.
dfTrain = dfTrain = pd.read_csv("D:\\Kaggle\\labeledTrainData.tsv\\labeledTrainData.tsv",header=0,delimiter="\t",quoting=3)
This is because my earlier path was not correct.Hope you get it reolved
This happens to me quite often. Usually I open the csv file in Excel, and save it as an xlsx file, and it works.
so instead of
df = pd.read_csv(r"...\file.csv")
Use:
df = pd.read_excel(r"...\file.xlsx")
If you're sure the path is correct, make sure no other programs have the file open. I got that error once, and closing the Excel file made the error go away.
Try this:
import os
import pandas as pd
trainFile = os.path.join('F:',os.sep,'Projects','Python','coursera','intro-to-data-science','train.csv' )
trainData = pd.read_csv(trainFile)