Dynamically Parsing a worksheet in Pandas using Python 3 - python

My question is regarding parsing worksheets in Panda (Python 3).
Right now my code looks like this:
var = input("Enter the path for the Excel file you want to use: ")
import pandas as pd
xl = pd.ExcelFile(var)
df = xl.parse("HelloWorld")
df.head()
with my code parsing the worksheet "HelloWorld" within an excel file the user inputs. However, sometimes the worksheet within the file will not be called "HelloWorld" in which case the parsing code will fail.
Does anyone know how to set the variable "df" to dynamically read the name of the worksheet within the excel file. There will always be only ONE worksheet in these excel files so whatever worksheet is in the file, I want my code to read.
Thank you for the help!

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.excel.ExcelFile.parse.html
You can pass in the sheet number instead of the name.
var = input("Enter the path for the Excel file you want to use: ")
import pandas as pd
xl = pd.ExcelFile(var)
df = xl.parse(sheetname=0)
df.head()

Related

adding multiple csv to an excel file with keeping the names of these csvs files

I am trying to collect multiple csvs files into one excel workbook and keeping the names of csvs files on each sheet but the loop can not save the sheet for each step and I only get only the last sheet only ?
for i in range(0,len(dir)):
for filee in os.listdir(dir):
if filee.endswith(".csv"):
file_path = os.path.join(dir, filee)
df = pd.read_csv(file_path, on_bad_lines='skip')
df.to_excel("output.xlsx",sheet_name=filee, index=False)
i=i+1
I have tried ExcelWriter but the file got error
could anyone help to fix this problem
Regards
This code would produce a SyntaxError since the first for loop is not defined properly. However, assuming that it is an IndentationError and moving to the for-loop body.
In each .csv file, the for-loop reads that into a pandas.DataFrame and writes it into output.xlsx. Basically, you override the file in each iteration. Thus, you only see the last sheet only.
Please! have a look to this link: Add worksheet to existing Excel file with pandas
Usually, the problem is the type of the sheet name. For example in df.to_excel("Output.xlsx",sheet_name = '1') If I don't put the 1 in the quotation, I will get an error. It must always be of str type
For example, I have the following csv files in Google Collab files:
With the following code, I first put all of them in df and then transfer them to the Excel file (in separate sheets).
import pandas as pd
df = {}
for i in range(1,5):
df[i] = pd.read_csv('sample_data/file'+str(i)+'.csv')
with pd.ExcelWriter('output.xlsx') as writer:
for i in range(1,5):
df[i].to_excel(writer, sheet_name = str(i))
It works fine for me and I don't get any errors.
You can use a dict comp to store all dfs and file names from each csv then pass it to a function. Unpack dict with a list comp and write to sheets.
from pathlib import Path
import pandas as pd
path = "/path/to/csv/files"
def write_sheets(file_map: dict) -> None:
with pd.ExcelWriter(f"{path}/output.xlsx", engine="xlsxwriter") as writer:
[df.to_excel(writer, sheet_name=sheet_name, index=False) for sheet_name, df in file_map.items()]
file_mapping = {Path(file).stem: pd.read_csv(file) for file in Path(path).glob("*csv")}
write_sheets(file_mapping)

Excel file conversion using AzureFunction python

I need python code to read in multiple Excel data in a blob using AzureFunction and output multiple sheets ('outlet','product','delivery') of each Excel to a csv file for conversion.
 When outputting, I want each csv file to be named with the A1 value of the Excel data and saved in a separate blob folder. (Each corresponding sheet in Excel has the same format)(The contents of the excel data are numbers and letters).
I apologize for my lack of learning, but I don't know the Python code to do this in AzureFunction. Please let me know.
The image of the python code is as follows.
import pandas as pd
import openpyxl
from azure.storage.blob import BlobClient,BlobServiceClient,ContentSettings
connectionstring="XXXXXXXXXXXXXXXX"
#blob
container = "workspace/folder_excel"
save_container = "workspace/folder_csv"
excel_files="*.xlsm"
sheet_name = ['outlet','product','delivery']
#read_excelfile_list
for excel_file in excel_files:
#ExcelBook
input_book = pd.ExcelFile( container + excel_file, engine="openpyxl")
#excel_sheet
#added A1cell number
for sheet_name in sheets:
sheet = pd.read_excel(excel_file, sheet_name=sheet_name)
sheet.to_csv(save_container `+ %.csv" % sheet_name, index=False)

Can't see csv file (converted from df) in files

After saving my dataframe to a csv in a specific location, the csv file doesn't appear in the location I saved it to. Is there any reason why it possibly is not showing?
Here is the code to save my dataframe to csv:
df.to_csv(r'C:\Users\gibso\OneDrive\Documents\JOSEPH\export_dataframe.csv', index = False)
Even changing an empty df does not seem to work.
import pandas as pd
olympics={}
df = pd.DataFrame(olympics)
df.to_csv(r'C:\Users\gibso\OneDrive\Documents\JOSEPH\export_dataframe.csv', index = False)
Thanks for the help!
I would rather use the module openpyxl. Example of saving:
import openpyxl
workbook = openpyxl.Workbook()
sheet = workbook.active
# Work on your workbook. Once finished:
workbook.save(file_name) # file_name is a variable you must define
Don't forget installing openpyxl with pip first!

How to load win32com Excel worksheet to Pandas df?

I have the following code:
import pandas as pd
import win32com.client
excel_app = win32com.client.Dispatch("Excel.Application")
file_path = r"path to the file"
file_password = "file password"
workbook = excel_app.Workbooks.Open(file_path, Password=file_password)
sheet = workbook.Sheets("sheet name")
Now I'd like to take the sheet variable and load it into a Pandas df. I was trying to accomplish it via saving the sheet to a separate file and then reading it from Pandas, but it seems to be over-complicating the issue, as the file is both password protected and in .xlsm format, so re-opening it directly from Pandas isn't straightforward.
How do I do it?
The UsedRange property of the sheet will return an array that encompasses all the cells in the worksheet that have data.
df = pd.DataFrame(sheet.UsedRange())
With the column headers as the column number, and the index as the row number. Both zero-based.

How to output dataframe values to an Excel file? [Python]

For the past few days I've been trying to do a relatively simple task but I'd always encounter some errors so I'd really appreciate some help on this. Here goes:
I have an Excel file which contains a specific column (Column F) that has a list of IDs.
What I want to do is for the program to read this excel file and allow the user to input any of the IDs they would like.
When the user types in one of the IDs, I would want the program to return a bunch IDs that contain the text that the user has inputted, and after that I'd like to export those 'bunch of IDs' to a new & separate Excel file where all the IDs would be displayed in one column but in separate rows.
Here's my code so far, I've tried using arrays and stuff but nothing seems to be working for me :/
import pandas as pd
import numpy as np
import re
import xlrd
import os.path
import xlsxwriter
import openpyxl as xl;
from pandas import ExcelWriter
from openpyxl import load_workbook
# LOAD EXCEL TO DATAFRAME
xls = pd.ExcelFile('N:/TEST/TEST UTILIZATION/IA 2020/Dev/SCS-FT-IE-Report.xlsm')
df = pd.read_excel(xls, 'FT')
# GET USER INPUT (USE AD1852 AS EXAMPLE)
value = input("Enter a Part ID:\n")
print(f'You entered {value}\n\n')
i = 0
x = df.loc[i, "MFG Device"]
df2 = np.array(['', 'MFG Device', 'Loadboard Group','Socket Group', 'ChangeKit Group'])
for i in range(17367):
# x = df.loc[i, "MFG Device"]
if value in x:
df = np.array[x]
df2.append(df)
i += 1
print(df2)
# create excel writer object
writer = pd.ExcelWriter('N:/TEST/TEST UTILIZATION/IA 2020/Dev/output.xlsx')
# write dataframe to excel
df2.to_excel(writer)
# save the excel
writer.save()
print('DataFrame is written successfully to Excel File.')
Any help would be appreciated, thanks in advance! :)
It looks like you're doing much more than you need to do. Rather than monkeying around with xlsxwriter, pandas.DataFrame.to_excel is your friend.
Just do
df2.to_excel("output.xlsx")
You don't need xlsxwriter. Simply df.to_excel() would work. In your code df2 is a numpy array/ First convert it into a pandas DataFrame format a/c to the requirement (index and columns) before writing it to excel.

Categories

Resources