This question already has answers here:
pandas.read_excel parameter "sheet_name" not working
(6 answers)
Closed 1 year ago.
I try to read a excel file using pandas with the code below:
path = "QVI_transaction_data.xlsx"
I also tried using "./QVI_transaction_data.xlsx" rather than the one above, the name is just copy pasted from os.listdir() so there is no transcription problems
pd.read_excel(path, sheet_name = "in")
but it didn't worked, it outputs this error:
OSError: [Errno 22] Invalid argument
I also tried without the sheet_name argument, others posts say that there is a problem with the filename but I had worked with pandas before and I don't think there is something wrong with the name. anyone knows what is wrong about this?
this is how the file looks like:
One possible thing that could be done is to convert the excel file (.xlsx) file to .csv file which can be done through file and export it with csv file and then it could be loaded like: -
import pandas as pd
Data=pd.read_csv("File Name...")
print(Data)
Or if you want to load only the excel file directly this could be done: -
import pandas as pds
file =('path_of_excel_file')
newData = pds.read_excel(file)
newData
I have tried in the both possible ways in form of csv and as well as excel. Try something like this: -
import pandas as pds
file =('path_of_excel_file')
newData = pds.read_excel(file)
newData
As we can not have Data to reproduce, hence there might be different situation and different solutions.
I'm enlisting few situations which may lead you in a right direction...
Situation 1:
If you are using old python Version, then you should try simply below as its sheetname with older version and sheet_name with new version.
import pandas as pd
df = pd.read_excel(file_with_data, sheetname=sheet_with_data)
OR
You can use pd.ExcelFile instead ..
xls = pd.ExcelFile('path_to_file.xls')
df1 = pd.read_excel(xls, 'in')
OR
xl = pd.ExcelFile(path)
# xl = pd.ExcelFile("Full_Path_of _file")
xl.sheet_names
[u'in', u'in1', u'in2']
df = xl.parse("in")
df.head()
OR
df = pd.read_excel(open('your_xls_xlsx_filename','rb'), sheet_name='Sheet 1')
# or using sheet index starting 0
df = pd.read_excel(open('your_xls_xlsx_filename','rb'), sheet_name=1)
Note: Opting sheetname argument needs be opted meticulously for python pandas version-wise>
For Older python Version: use sheetname
For New python version: use sheet_name
Situation 2:
Copy the address directly from the right-click file properties-security will cause this problem So, copying and pasting the file path also produce these issues while there is no other evident issues
solve
It has nothing to do with the backslash /forward slash in the path, and it has nothing to do with whether the path contains. There are two solutions
1 Enter the path manually
2 Open this path in Explorer, then copy
General reading convention:
# When the parameter is None, all tables are returned, which is a dictionary of tables;
sheet = pd.read_excel('example.xls',sheet_name= None)
# When the parameter is list = [0, 1, 2, 3], the returned multi-table is also a dictionary
sheet = pd.read_excel('example.xls',sheet_name= 0)
sheet = pd.read_excel('example.xls',sheet_name= [0,1])
#The data of the table can also be read according to the name of the table header or the position of the table
sheet = pd.read_excel('example.xls',sheet_name= 'Sheet0')
sheet = pd.read_excel('example.xls',sheet_name= ['Sheet0','Sheet1'])
sheet = pd.read_excel('example.xls',sheet_name=[0,1,'Sheet3'])
Related
I am trying to collect multiple csvs files into one excel workbook and keeping the names of csvs files on each sheet but the loop can not save the sheet for each step and I only get only the last sheet only ?
for i in range(0,len(dir)):
for filee in os.listdir(dir):
if filee.endswith(".csv"):
file_path = os.path.join(dir, filee)
df = pd.read_csv(file_path, on_bad_lines='skip')
df.to_excel("output.xlsx",sheet_name=filee, index=False)
i=i+1
I have tried ExcelWriter but the file got error
could anyone help to fix this problem
Regards
This code would produce a SyntaxError since the first for loop is not defined properly. However, assuming that it is an IndentationError and moving to the for-loop body.
In each .csv file, the for-loop reads that into a pandas.DataFrame and writes it into output.xlsx. Basically, you override the file in each iteration. Thus, you only see the last sheet only.
Please! have a look to this link: Add worksheet to existing Excel file with pandas
Usually, the problem is the type of the sheet name. For example in df.to_excel("Output.xlsx",sheet_name = '1') If I don't put the 1 in the quotation, I will get an error. It must always be of str type
For example, I have the following csv files in Google Collab files:
With the following code, I first put all of them in df and then transfer them to the Excel file (in separate sheets).
import pandas as pd
df = {}
for i in range(1,5):
df[i] = pd.read_csv('sample_data/file'+str(i)+'.csv')
with pd.ExcelWriter('output.xlsx') as writer:
for i in range(1,5):
df[i].to_excel(writer, sheet_name = str(i))
It works fine for me and I don't get any errors.
You can use a dict comp to store all dfs and file names from each csv then pass it to a function. Unpack dict with a list comp and write to sheets.
from pathlib import Path
import pandas as pd
path = "/path/to/csv/files"
def write_sheets(file_map: dict) -> None:
with pd.ExcelWriter(f"{path}/output.xlsx", engine="xlsxwriter") as writer:
[df.to_excel(writer, sheet_name=sheet_name, index=False) for sheet_name, df in file_map.items()]
file_mapping = {Path(file).stem: pd.read_csv(file) for file in Path(path).glob("*csv")}
write_sheets(file_mapping)
I am attempting to import a large group of excels and the code that selects what to import is included below.
df = pd.read_excel (file, sheet_name = ['Sheet1', 'Sheet2'])
I know that the excels either use sheet1 or sheet2, however they do not use both. This makes my code error out. Is there anyway to tell pandas to try importing sheet1, and if that errors, trying sheet2?
Thanks for any help.
try:
df = pd.read_excel (file, sheet_name = ['Sheet1'])
except:
df = pd.read_excel (file, sheet_name = ['Sheet2'])
Assuming your Excel files aren't too large to import everything, you could do this:
df = pd.read_excel(file, sheet_name=None)
That would return all the sheets in the file as a dict, where the key is sheet name and the value is the dataframe. You can then test for the key you want and use that sheet, and drop the rest.
(Edit: I'll note that this may be a heavy-handed approach, but I tried to generalize the answer to how to select one or more sheets when you aren't sure of their names)
I have a large set of data that I am trying to extract from multiple excel files that have multiple sheets using python and then write that data into a new excel file. I am new with python and have tried to use various tutorials to come up with code that can help me automate the process. However, I have reached a point where I am stuck and need some guidance on how to write the data that I extract to a new excel file. If someone could point me in the write direction, it would be greatly appreciated. See code below:
import os
from pandas.core.frame import DataFrame
path = r"Path where all excel files are located"
os.chdir(path)
for WorkingFile in os.listdir(path):
if os.path.isfile(WorkingFile):
DataFrame = pd.read_excel(WorkingFile, sheet_name = None, header = 12, skipfooter = 54)
DataFrame.to_excel(r'Empty excel file where to write all the extracted data')
When I execute the code I get an error "AttributeError: 'dict' object has no attribute 'to_excel'. So I am not sure how to rectify this error, any help would be appreciated.
Little bit more background on what I am trying to do. I have a folder with about 50 excel files, each file might have multiple sheets. The data I need is located on a table that consists of one row and 14 columns and is in the same location on each file and each sheet. I need to pull that data and compile it into a single excel file. When I run the code above and and a print statement, it is showing me the exact data I want but when I try to write it to excel it doesn't work.
Thanks for help in advance!
Not sure why you're importing DataFrame instead of pandas. Looks like your code is incomplete. Below code will clear the doubts you have. (Not include any conditions for excluding non excel files/dir etc )
import pandas as pd
import os
path = "Dir path to excel files" #Path
df = pd.DataFrame() # Initialize empty df
for file in os.listdir(path):
data = pd.read_excel(path + file) # Read each file from dir
df = df.append(data, ignore_index=True) # and append to empty df
# process df
df.to_excel("path/file.xlsx")
After saving my dataframe to a csv in a specific location, the csv file doesn't appear in the location I saved it to. Is there any reason why it possibly is not showing?
Here is the code to save my dataframe to csv:
df.to_csv(r'C:\Users\gibso\OneDrive\Documents\JOSEPH\export_dataframe.csv', index = False)
Even changing an empty df does not seem to work.
import pandas as pd
olympics={}
df = pd.DataFrame(olympics)
df.to_csv(r'C:\Users\gibso\OneDrive\Documents\JOSEPH\export_dataframe.csv', index = False)
Thanks for the help!
I would rather use the module openpyxl. Example of saving:
import openpyxl
workbook = openpyxl.Workbook()
sheet = workbook.active
# Work on your workbook. Once finished:
workbook.save(file_name) # file_name is a variable you must define
Don't forget installing openpyxl with pip first!
I have a basic question about importing xlsx files to Python. I have checked many responses about the same topic, however I still cannot import my files to Python whatever I try. Here's my code and the error I receive:
import pandas as pd
import xlrd
file_location = 'C:\Users\cagdak\Desktop\python_self_learning\Coursera\sample_data.xlsx'
workbook = xlrd.open_workbook(file_location)
Error:
IOError: [Errno 2] No such file or directory: 'C:\\Users\\cagdak\\Desktop\\python_self_learning\\Coursera\\sample_data.xlsx'
With pandas it is possible to get directly a column of an Excel file. Here is the code.
import pandas
df = pandas.read_excel('sample.xls')
#print the column names
print df.columns
#get the values for a given column
values = df['column_name'].values
#get a data frame with selected columns
FORMAT = ['Col_1', 'Col_2', 'Col_3']
df_selected = df[FORMAT]
You should use raw strings or escape your backslash instead, for example:
file_location = r'C:\Users\cagdak\Desktop\python_self_learning\Coursera\sample_data.xlsx'
or
file_location = 'C:\\Users\\cagdak\\Desktop\python_self_learning\\Coursera\\sample_data.xlsx'
go ahead and try this:
file_location = 'C:/Users/cagdak/Desktop/python_self_learning/Coursera/sample_data.xlsx'
As pointed out above Pandas supports reading of Excel spreadsheets using its read_excel() method. However, it is dependent upon a number of external libraries depending on which version Excel/odf is being accessed. It defaults to selecting one automatically, though one can be specified using the engine parameter. Here's an excerpt from the docs:
"xlrd" supports old-style Excel files (.xls).
"openpyxl" supports newer Excel file formats.
"odf" supports OpenDocument file formats (.odf, .ods, .odt).
"pyxlsb" supports Binary Excel files.
If the required library is not already installed you'll see an error message suggesting library you need to install.