Pandas excel reading buffer error (python 3) - python

I am having a problem reading an excel file from a download link using pandas. The excelString below loads correctly and looks like an excel file, but when trying to convert it to excel using pandas it says the file name is too long. Any assistance would be appreciated. This is a useful generic problem to solve for anyone accessing iShares index membership info.
import urllib
import pandas as pd
f = urllib.request.urlopen('https://www.ishares.com/us/239714/fund-download.dl')
excelString = f.read().decode('utf-8')
pd.ExcelFile(excelString)
The Error returned is OSError: [Errno 36] File name too long

Works fine for me using Python3 and pandas 0.16.2 - do you have the latest version?

Related

I am getting an error expected <class 'openpyxl.styles.fills.Fill'> reading an excel file with pandas read_excel

I am trying to read an excel file with pandas read_excel function, but I keep getting the following error:
expected <class 'openpyxl.styles.fills.Fill'>
The exact code I tiped is:
corrosion_df=pd.read_excel('Corrosion.xlsx')
I already double checked the filename and it is correct. The file is also saved in the correct directory. I don't know what's going wrong because I used this method many times and until now it has always worked. Thank you very much in advance.
I had the same issue, but I found when I made some changed the spreadsheet and resaved the problem stopped.
I think the answer here is the most helpful:
Error when trying to use module load_workbook from openpyxl
My data was also being autogenerated by another site so I'm assuming there is so slight corruption in their process. I'm adding the option of csv to my project just to give an alternative.
The only way was to manually open it, save it and load it.
My workaround for it is to convert the file using libreoffice:
I ran this command line in my jupyter notebook:
!libreoffice --convert-to xls 'my_file.xlsx'
this creates a new file named my_file.xls, this file can be opened now with pandas.
import pandas as pd
df = pd.read_excel('my_file.xls')
I had the same problem. I just resaved the excel file.

How to read a Time-series data in Python 3.7

So today, I started with time-series data using Python. First, I tried reading the time series data from a CSV file by using the panda library pd
Unfortunately, I keep getting this error? Any help on this would be highly appreciated.
PS: I am using Python 3.73
address = 'C:/Users/Anih John/Desktop/Python-workstation/ff/Superstore-Sales.csv'
Superstore = pd.read_csv(address, index_col='Order Date',parse_dates=True)
print(Superstore)
Then I get the following error:
Unable to open 'parsers.pyx': Unable to read file (Error: File not found (c:\users\anih john\desktop\python-workstation\ff\pandas\_libs\parsers.pyx)).
I would try with the following commands:
import pandas as pd
address = 'C:\\Users\\Anih John\\Desktop\\Python-workstation\\ff\\Superstore-Sales.csv'
Superstore = pd.read_csv(address, index_col='Order Date',parse_dates=True, engine='python',sep=',')
print(Superstore)
EDIT: Or simply using the file in the read_csv function.
import pandas as pd
Superstore = pd.read_csv('C:\\Users\\Anih John\\Desktop\\Python-workstation\\ff\\Superstore-Sales.csv', index_col='Order Date',parse_dates=True, engine='python',sep=',')
print(Superstore)

Python pandas read_csv returning FileNotFoundError despite existing Mac

I am trying to read into a pandas dataframe from a csv. The data is in the format:
date,total_bytes
2018-08-27,1.84E+14
2018-08-30,1.90E+14
2018-08-31,1.93E+14
My code looks like:
from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot
series =
read_csv(r'/Users/taylorjewell/Desktop/dataset_size_daily.csv',
header=0)
print(series.head())
series.plot()
pyplot.show()
Despite that path existing (I have checked countless times), I am getting a file not found exception for some reason:FileNotFoundError: File b'/Users/taylorjewell/Desktop/dataset_size_daily' does not exist
I am running this on a mac if that is relevant. Any help you are able to offer would be much appreciated!!
For file paths, I would suggest using pathlib:
from pathlib import Path
data_file = Path("/Users/taylorjewell/Desktop/dataset_size_daily.csv")
series = read_csv(data_file, header=0)
However, it also depends on where you are trying to access the file from.
i dont think you need to use the r bit for mac
try
read_csv('/Users/taylorjewell/Desktop/dataset_size_daily.csv',
header=0)
Just ran into this issue today and wanted to share-
If you download a CSV file to a mac
But then open the file and save it
The file extension changes to .numbers
So make sure you just move the file without opening it, and double-check that the file extension is .csv

How to convert a csv file to a dataframe in Python 3.6 [duplicate]

This question already exists:
Reading CSV files in Python, using Jupyter Notebook through IntelliJ IDEA
Closed 4 years ago.
Im trying to tackle the Kaggle Titanic challenge. Bear with me, as Im fairly new to data science. I was previously struggling to get the following syntax to work: my previous question(Reading CSV files in Python 3.6, using IntelliJ IDEA)
Reading CSV files in Python, using Jupyter Notebook through IntelliJ IDEA
import numpy as np
import pandas as pd
from pandas import Series,Dataframe
titanic_df = pd.read_csv('train.csv')
titanic.head()
However, using the below code, I am able to open the file and read it/print its contents, but i need to convert the data to a dataframe so that it can be worked with. Any suggestions?
file_path = '/Volumes/LACIE SETUP/Data_Science/Data_Analysis_Viz_InPython/Example_Projects/train.csv'
with open(file_path) as train_fp:
for line in train_fp:
# print(line)
This above code was able to print out the data but when I tried passing
'file_path' to:
titanic_df = pd.read_csv('file_path.csv')
i received the same error as before. Not sure what Im doing wrong. I KNOW the file 'train.csv' exists in that location because 1) i put it there and 2) its contents can be printed when pointed to its location.
So what the heck am I doing wrong??? :/
read_csv will create a Pandas DataFrame. So, as long as your file path is right, this following code should work. Also, make sure to use the file_path variable and not the string "file_path.csv"
import pandas as pd
file_path = '/Volumes/LACIE SETUP/Data_Science/Data_Analysis_Viz_InPython/Example_Projects/train.csv'
titanic_df = pd.read_csv(file_path)
titanic_df.head()

CParserError: Error tokenizing data

I'm having some trouble reading a csv file
import pandas as pd
df = pd.read_csv('Data_Matches_tekha.csv', skiprows=2)
I get
pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 526, saw 5
and when I add sep=None to df I get another error
Error: line contains NULL byte
I tried adding unicode='utf-8', I even tried CSV reader and nothing works with this file
the csv file is totally fine, I checked it and i see nothing wrong with it
Here are the errors I get:
In your actual code, the line is:
>>> pandas.read_csv("Data_Matches_tekha.xlsx", sep=None)
You are trying to read an Excel file, and not a plain text CSV which is why things are not working.
Excel files (xlsx) are in a special binary format which cannot be read as simple text files (like CSV files).
You need to either convert the Excel file to a CSV file (note - if you have multiple sheets, each sheet should be converted to its own csv file), and then read those.
You can use read_excel or you can use a library like xlrd which is designed to read the binary format of Excel files; see Reading/parsing Excel (xls) files with Python for for more information on that.
Use read_excel instead read_csv if Excel file:
import pandas as pd
df = pd.read_excel("Data_Matches_tekha.xlsx")
I have encountered the same error when I used to_csv to write some data and then read it in another script. I found an easy solution without passing by pandas' read function, it's a package named Pickle.
You can download it by typing in your terminal
pip install pickle
Then you can use for writing your data (first) the code below
import pickle
with open(path, 'wb') as output:
pickle.dump(variable_to_save, output)
And finally import your data in another script using
import pickle
with open(path, 'rb') as input:
data = pickle.load(input)
Note that if you want to use, when reading your saved data, a different python version than the one in which you saved your data, you can precise that in the writing step by using protocol=x with x corresponding to the version (2 or 3) aiming to use for reading.
I hope this can be of any use.

Categories

Resources