Python function to import filename and city name - python

Very new to python and trying my best to learn. I understand the concept of functions and they don;t seem complicated but for some reason I have the worst time with them.
I have an excel spreadsheet I need to open the file and read data in from a specific sheet w/in the file.
I setup the function like so:
def process_data(file, city):
file_name = "../data/" + file # path to file + file name
sheet = city # sheet name
process_data("Jan 10.xlsx", "Seattle")
but it doesn't work. I ultimately want to read this into a panda dataframe so I can manipulate the data. Can someone give a newbie a little guidance?
All help is greatly appreciated....

pip install pandas # if pandas isn't installed
Import pandas as pd #importing pandas
df_sheet = pd.read_excel('file_name', 'sheet_name) # creating dataframe
First parameter is for the file_name, and second parameter is for the
Sheet name from that excel file. Hope this helps.
Here is the link for official Pandas read_excel()

Related

I need to make my code take on excel files (with dates) automatically, so I don't need to do it manually - Python/Excel

I am writing a cleaning function to clean my excel files to analyse them. I get a excel sheet everyday which is named for example:
"tourniquets_27.07.2022_raw.xls"
I get such a sheet everyday and so the date changes to that day. I now change the dates manually but I would love to make this automatic. I also save the file as follow: "tourniquets_28.07.2022_cleaned.xls". The dates need to bne changed here as well.
I tried the following code but it gives an error, I tried other things as well but this seems to be the most close to what I want.
import pandas as pd
import os
from pathlib import Path
from datetime import date
#assuming the files are in the directory:
folder = Path("C:/Users/JHA4/Desktop/Code/Tourniquets/Cleaned")
date_string = f"tourniquets_{date.today().month}.{date.today().day}.{date.today().year}_raw.xlsx"
xlsx_file = folder.glob(date_string)
#read in data
df = pd.read_excel(io=next(xlsx_file)
#Save it back to excel with a new name=
df.to_excel(io=next(xlsx_file))
Hope that I asked everything clearly!
Thank you in advance.

Extracting data from excel using python and writing to an empty excel file

I have a large set of data that I am trying to extract from multiple excel files that have multiple sheets using python and then write that data into a new excel file. I am new with python and have tried to use various tutorials to come up with code that can help me automate the process. However, I have reached a point where I am stuck and need some guidance on how to write the data that I extract to a new excel file. If someone could point me in the write direction, it would be greatly appreciated. See code below:
import os
from pandas.core.frame import DataFrame
path = r"Path where all excel files are located"
os.chdir(path)
for WorkingFile in os.listdir(path):
if os.path.isfile(WorkingFile):
DataFrame = pd.read_excel(WorkingFile, sheet_name = None, header = 12, skipfooter = 54)
DataFrame.to_excel(r'Empty excel file where to write all the extracted data')
When I execute the code I get an error "AttributeError: 'dict' object has no attribute 'to_excel'. So I am not sure how to rectify this error, any help would be appreciated.
Little bit more background on what I am trying to do. I have a folder with about 50 excel files, each file might have multiple sheets. The data I need is located on a table that consists of one row and 14 columns and is in the same location on each file and each sheet. I need to pull that data and compile it into a single excel file. When I run the code above and and a print statement, it is showing me the exact data I want but when I try to write it to excel it doesn't work.
Thanks for help in advance!
Not sure why you're importing DataFrame instead of pandas. Looks like your code is incomplete. Below code will clear the doubts you have. (Not include any conditions for excluding non excel files/dir etc )
import pandas as pd
import os
path = "Dir path to excel files" #Path
df = pd.DataFrame() # Initialize empty df
for file in os.listdir(path):
data = pd.read_excel(path + file) # Read each file from dir
df = df.append(data, ignore_index=True) # and append to empty df
# process df
df.to_excel("path/file.xlsx")

How do I execute this python code automatically in in excel cells?

I need to extract the domain for example: (http: //www.example.com/example-page, http ://test.com/test-page) from a list of websites in an excel sheet and modify that domain to give its url (example.com, test.com). I have got the code part figured put but i still need to get these commands to work on excel sheet cells in a column automatically.
here's_the_code
I think you should read in the data as a pandas DataFrame (pd.read_excel), make a function from your code then apply to the dframe (df.apply). Then it is easy to save to excel with pd.to_excel().
ofc you will need pandas to be installed.
Something like:
import pandas as pd
dframe = pd.read_excel(io='' , sheet_name='')
dframe['domains'] = dframe['urls col name'].apply(your function)
dframe.to_excel('your path')
Best

How to create a hierarchical csv file?

I have following N number of invoice data in Excel and I want to create CSV of that file so that it can be imported whenever needed...so how can I archive this?
Here is a screenshot:
Assuming you have a Folder "excel" full of Excel Files within your Project-Directory and you also have another folder "csv" where you intend to put your generated CSV Files, you could pretty much easily batch-convert all the Excel Files in the "excel" Directory into "csv" using Pandas.
It will be assumed that you already have Pandas installed on your System. Otherwise, you could do that via: pip install pandas. The fairly commented Snippet below illustrates the Process:
# IMPORT DATAFRAME FROM PANDAS AS WELL AS PANDAS ITSELF
from pandas import DataFrame
import pandas as pd
import os
# OUR GOAL IS:::
# LOOP THROUGH THE FOLDER: excelDir.....
# AT EACH ITERATION IN THE LOOP, CHECK IF THE CURRENT FILE IS AN EXCEL FILE,
# IF IT IS, SIMPLY CONVERT IT TO CSV AND SAVE IT:
for fileName in os.listdir(excelDir):
#DO WE HAVE AN EXCEL FILE?
if fileName.endswith(".xls") or fileName.endswith(".xlsx"):
#IF WE DO; THEN WE DO THE CONVERSION USING PANDAS...
targetXLFile = os.path.join(excelDir, fileName)
targetCSVFile = os.path.join(csvDir, fileName) + ".csv"
# NOW, WE READ "IN" THE EXCEL FILE
dFrame = pd.read_excel(targetXLFile)
# ONCE WE DONE READING, WE CAN SIMPLY SAVE THE DATA TO CSV
pd.DataFrame.to_csv(dFrame, path_or_buf=targetCSVFile)
Hope this does the Trick for you.....
Cheers and Good-Luck.
Instead of putting total output into one csv, you could go with following steps.
Convert your excel content to csv files or csv-objects.
Each object will be tagged with invoice id and save into dictionary.
your dictionary data structure could be like {'invoice-id':
csv-object, 'invoice-id2': csv-object2, ...}
write custom function which can reads your csv-object, and gives you
name,product-id, qty, etc...
Hope this helps.

how add link to excel file using python

I'm generating an csv file that is opened by excel and converted to xlsx manually.
The csv contains some path to .txt files.
Is it possible to build the file path in such way that when the csv is converted to xlsx , they became clickable hyperlinks ?
Thanks.
I would be interested to understand your workflow a bit better, but to try and help with your specific request:
The HYPERLINK solution proposed in the comments looks like a good one
If you are able to implement that upstream in the csv generation step then great
If not and/or you are interested in automating the conversion process, consider using the pandas library:
Create a DataFrame object from a csv using the pandas.read_csv method
Convert your paths to HYPERLINKs
Write back to xlsx using the pandas.DataFrame.to_excel method
E.g. if you have a file original.csv and the relevant column header is file_paths:
import pandas as pd
df = pd.read_csv('original.csv')
df['file_paths'] = '=HYPERLINK("' + df['file_paths'] + '")'
df.to_excel('new.xlsx', index=False)
Hope that helps!
Jon

Categories

Resources