How to read excel files in a for loop with openpyxl?

How to read excel files in a for loop with openpyxl? - python

This seems tricky for me. Let's say I have, nested in a directory tree, an excel file with a few non-empty columns. I want to get the sum of all values located in column F with openpyxl:
file1.xlsx
A B C D E F
5
7
11
17
20
29
34
My take on it would be as follows, but it is wrong:
import os
from openpyxl import load_workbook
directoryPath=r'C:\Users\MyName\Desktop\MyFolder' #The main folder
os.chdir(directoryPath)
folder_list=os.listdir(directoryPath)
for folders, sub_folders, file in os.walk(directoryPath): #Traversing the sub folders
for name in file:
if name.endswith(".xlsx"):
filename = os.path.join(folders, name)
wb=load_workbook(filename, data_only=True)
ws=wb.active
cell_range = ws['F1':'F7'] #Selecting the slice of interest
sumup=0
for row in cell_range:
sumup=sumup+cell.value
While running this I get NameError: name 'cell' is not defined. How to work around this?

The main thing currently wrong is that you are only iterating through the rows, not the columns(cells) within that row.
At the end of your code, you can do this (Replace the two end lines of your code):
for row in cell_range: # This is iterating through rows 1-7
for cell in row: # This iterates through the columns(cells) in that row
value = cell.value
sumup += value
You identified that you didn't think this was running through each of your excel files. This would have been very easy to debug. Remove all code after
ws=wb.active
And add
print(name + ' : ' + ws)
This would have printed out all of the excel file names, and their active sheet. If it prints out more than 1, then it's obviously crawling through and grabbing the excel files...

Related

Averaging a number of files from a folder

I'm trying to write a function that pulls from a folder path, reads the files (each is a 2 by inf array) in sets of n, averages the second row of each file by column and writes those results out to an excel file. I expect this to loop until I have reached the end of the files in the folder.
For example the function is given a file path and an n value. ie.(path,2) each of the following arrays would be a different file in the path to the folder. The code would average the second row of each array and output the average row-by-row.
Example:
[1,2;3,4] [1,2;5,6]
[1,2;7,8] [1,2;9,10]
[1,2;3,4] [1,2;9,10]
would output in an excel file:
4 5
8 9
6 7
This is my current code:
def fileavg(path,n):
import numpy as np
import xlsxwriter
from glob import glob
workbook = xlsxwriter.Workbook('Test.xlsx')
worksheet = workbook.add_worksheet()
row=0
glob.iglob(path) #when inputting path name begin with r' and end with a '
for i in range(0,len(1),n):
f=yield 1[i:i +n]
A=np.mean(f(1),axis=1)
for col, data in enumerate(A):
worksheet.write_column(row, col, data)
row +=1
I receive a generator object error when I attempt to run the function. Please let me know what this means and where any mistakes might be as I'm quite new to python.

Copy pasting a excel column from one excel document to another

I am going crazy here. My code works but the "was2.cell(row = 1, column = 2).value = c.value" line is not saving no matter what I do. I keep getting a "int object has no attribute value" error message. Any ideas or suggestions ?
import openpyxl as xl;
from openpyxl import load_workbook;
# opens the source excel file
#"C:\Users\wwwya\Desktop\mkPox.xlsx" <-- needs to have double backwords slash for xl to understand
mkPox ="C:\\Users\\wwwya\\Desktop\\mkPox.xlsx"
wbMonkey1 = xl.load_workbook(mkPox)
ws1 = wbMonkey1.worksheets[0]
# opens the destination excel file
mkPaste ="C:\\Users\\wwwya\\Desktop\\mkPaste.xlsx"
wbPaste2 = xl.load_workbook(mkPaste)
ws2 = wbPaste2.active
# calculate total number of rows and
# columns in source excel file
mr = ws1.max_row
mc = ws1.max_column
# copying the cell values from source
# excel file to destination excel file
for row in range(2, mr + 1):
for column in "B": #Here you can add or reduce the columns
cell_name = "{}{}".format(column, row)
c = ws1[cell_name].value # the value of the specific cell
print(c)
# writing the read value to destination excel file
ws2.cell(row=2, column=2).value = c.value
# saving the destination excel file
wbPaste2.save(str(mkPaste))```

Your code had a couple of issues around this section
c = ws1[cell_name].value # the value of the specific cell
print(c)
# writing the read value to destination excel file
ws2.cell(row=2, column=2).value = c.value
You assigned c already to the 'value' of the cell, ws1[cell_name].value therefore c is a literal equal to the value of that cell, it has no attributes. When you attempt to assign the cell value on the 2nd sheet, you just want the variable 'c', as #norie indicated.
The next issue in that section is that is the row and column for ws2.cell doesn't change. Therefore whatever you are writing to the 2nd sheet is always going to cell 'B2' making the iteration thru the 1st sheet a waste of time, only cell 'B2' will have a value and it will be from the last cell in column 'B' in the 1st sheet.
Also there is no need to include a file path/name in wbPaste2.save(str(mkPaste)) if saving to the same file. It's only necessary if you want to save to a different path and filename. However if you include the filename it would still work. There is no need to cast as string since mkPaste is already a string.
The code example below shows how you can simplify the whole operation to a few lines;
Note; the loop uses enumerate to create two variables that update each loop iteration.
for enum, c in enumerate(ws1['B'][1:], 2):
enum is used as the row position in ws2, the '2' in the enumerate function means the enum variable initial value is 2, so the first row to be written on the 2nd sheet is row 2.
c is the cell object from ws1, column 'B'. The loop starts at the second cell due to the [1:] param in line with your code starting the copy from row 2.
There is no need to use intermediary variables, just assign each cell in the 2nd sheet the value of the corresponding cell in the 1st sheet then save the file.
import openpyxl as xl;
mkPox ="C:\\Users\\wwwya\\Desktop\\mkPox.xlsx"
wbMonkey1 = xl.load_workbook(mkPox)
ws1 = wbMonkey1.worksheets[0]
# opens the destination excel file
mkPaste ="C:\\Users\\wwwya\\Desktop\\mkPaste.xlsx"
wbPaste2 = xl.load_workbook(mkPaste)
ws2 = wbPaste2.active
for enum, c in enumerate(ws1['B'][1:], 2):
ws2.cell(row=enum, column=c.column).value = c.value
# saving the destination excel file
wbPaste2.save()

content from multiple txt files into single excel file using python

If I have for example 3 txt files that looks as follows:
file1.txt:
a 10
b 20
c 30
file2.txt:
d 40
e 50
f 60
file3.txt:
g 70
h 80
i 90
I would like to read this data from the files and create a single excel file that will look like this:
Specifically in my case I have 100+ txt files that I read using glob and loop.
Thank you

There's a bit of logic involved into getting the output you need.
First, to process the input files into separate lists. You might need to adjust this logic depending on the actual contents of the files. You need to be able to get the columns for the files. For the samples provided my logic works.
I added a safety check to see if the input files have the same number of rows. If they don't it will seriously mess up the resulting excel file. You'll need to add some logic if a length mismatch happens.
For the writing to the excel file, it's very easy using pandas in combination with openpyxl. There are likely more elegant solutions, but I'll leave it to you.
I'm referencing some SO answers in the code for further reading.
requirements.txt
pandas
openpyxl
main.py
# we use pandas for easy saving as XSLX
import pandas as pd
filelist = ["file01.txt", "file02.txt", "file03.txt"]
def load_file(filename: str) -> list:
result = []
with open(filename) as infile:
# the split below is OS agnostic and removes EOL characters
for line in infile.read().splitlines():
# the split below splits on space character by default
result.append(line.split())
return result
loaded_files = []
for filename in filelist:
loaded_files.append(load_file(filename))
# you will want to check if the files have the same number of rows
# it will break stuff if they don't, you could fix it by appending empty rows
# stolen from:
# https://stackoverflow.com/a/10825126/9267296
len_first = len(loaded_files[0]) if loaded_files else None
if not all(len(i) == len_first for i in loaded_files):
print("length mismatch")
exit(419)
# generate empty list of lists so we don't get index error below
# stolen from:
# https://stackoverflow.com/a/33990699/9267296
result = [ [] for _ in range(len(loaded_files[0])) ]
for f in loaded_files:
for index, row in enumerate(f):
result[index].extend(row)
result[index].append('')
# trim the last empty column
result = [line[:-1] for line in result]
# write as excel file
# stolen from:
# https://stackoverflow.com/a/55511313/9267296
# note that there are some other options on this SO question, but this one
# is easily readable
df = pd.DataFrame(result)
with pd.ExcelWriter("output.xlsx") as writer:
df.to_excel(writer, sheet_name="sheet_name_goes_here", index=False)
result:

Creating new excel file with transformations from existing excel file (using python)

I have an xlsx file, df, that contains a large amount of data. I wish to extract data from a particular cell and create a new xlsx file that contains this extracted data, along with a date.
Here is the file, df:
A B C #headers in excel
1 2 3
Desires Output:
I wish to extract the number 3 from C1 (column C, row 1) and then create a new file, df2, which looks like the following -
Date Value
1/1/2020 3
This is what I am doing:
import xlrd #package for working with excel files
import xlwt #allows you to create a new file
df = pd.read_excel(df.xlsx, sheetname="Sheet1") #reading in my .xlsx file
worksheet = workbook.sheet_by_index(0) #one sheet that I am iterating over
sheet.cell(0, 2).value #extracting the value in the first row, 2nd column
sheet.write(0, 2) #Inserting data in 1st row and 2nd Column
However, I am stuck on adding a particular date within the newly created file
Any suggestion is appreciated

openpyxl is my favorite library for excel-python process. I've used it into my company project for importing and exporting data from towards excel.
Python library for reading and writing Excel (with extension xlsx/xlsm/xltx/xltm) files.
First, to install this package, you need to terminate this command:
sudo pip3 install openpyxl
Let's give you an example for how it works.
Input Excel File
Python Code
Print the first column value
# importing openpyxl module
import openpyxl
# Give the location of the file
path = "C:\\Users\\Admin\\Desktop\\demo.xlsx"
# workbook object is created
wb_obj = openpyxl.load_workbook(path)
sheet_obj = wb_obj.active
m_row = sheet_obj.max_row
# Loop will print all values
# of first column
for i in range(1, m_row + 1):
cell_obj = sheet_obj.cell(row = i, column = 1)
print(cell_obj.value)
OUTPUT
STUDENT 'S NAME
ANKIT RAI
RAHUL RAI
PRIYA RAI
AISHWARYA
HARSHITA JAISWAL
Reference:
Reading an excel file using Python openpyxl module

Developing a Counter for Outputting Data Info into a specific range of excel rows & columns?

I am a beginner in Python with just basic fundamentals under my belt, ie. loops and modules. I have some data that I want to automate by exporting it as text into specific Microsoft excel cells.
Specifically, I have 3 folders each with image files in them, a total of 10 image files. My goal is to make a code that opens up excel and outputs the name of the folder path in each descending row of Column A and the respective image file name in descending rows of column B.
So far, I have defined the folder and path, and have allowed my code to open up Excel and name the sheet and put a title in. I encountered my problem when trying to individually iterate each folder/file into a new cell. I tried using the range function, but it doesn't work with strings and I feel like a simple counter variable would work, but again, excel column and row names are strings.
Here is my code so far:
import win32com.client
import sys, os, string, arcpy
data_folder = "F:\\School\\GEOG_390\\Week11\\data"
xlApp = win32com.client.Dispatch("Excel.Application")
xlApp.Visible = 1
xlApp.Workbooks.Add()
print(xlApp.Worksheets("Sheet1").Name)
xlApp.Worksheets("Sheet1").Range("A1").Value= "Data Files:"
for root, folders, files, in os.walk(data_folder):
for folder in folders:
workspace = os.path.join(root, folder)
print( "Processing" + " " + workspace)
arcpy.env.workspace = workspace
rasters = arcpy.ListRasters("*", "IMG")
for raster in rasters:
arcpy.BuildPyramids_management(raster)
arcpy.CalculateStatistics_management(raster)
print(raster)
sheet = xlApp.Worksheets("Sheet1")
sheet.Range("A2").Value = "Folder:" + folder
sheet.Range("B2").Value = "Raster:" + raster
print(sheet.Range("A2").Value)
print(sheet.Range("B2").Value)
I need the code to put folder name 1 in cell A2 and the image file name in B2, and from there on, folder 2 in cell A3 and file 2 in cell B3, all the way until cells A11 and B11.

If I understand the problem correctly, you are having trouble figuring out how to increment the names of the cells (for example, "A2" and "B2") each time you process a file. My apologies if that is not the issue.
(1) Before the first for loop, declare a variable cell_row that will track the row number where the next pair of cells will be created. It starts at 2.
cell_row = 2
(2) Modify the end of your loop as follows:
folder_cell = "A" + str(cell_row)
raster_cell = "B" + str(cell_row)
sheet = xlApp.Worksheets("Sheet1")
sheet.Range(folder_cell).Value = "Folder:" + folder
sheet.Range(raster_cell).Value = "Raster:" + raster
print(sheet.Range(folder_cell).Value)
print(sheet.Range(raster_cell).Value)
cell_row += 1

you need to think of it programatically start with a method name
def get_xls_range(start,end):
letter1,num1 = start[0],int(start[1:]) # get the alpha part and the numeric part
letter2,num2 = end[0],int(end[1:]) # same for end
#convert the letters to a range
range_rows = [chr(x) for x in range(ord(letter1),ord(letter2)+1)]
#convert the numerics to a range
range_cols = range(num1,num2+1)
for row in range_rows:
for col in range_cols:
yield "%s%s"%(row,col)
then all you need to do is interact with your new function
for cell in get_xls_range("A1","B22"):
print cell

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to read excel files in a for loop with openpyxl? - python

Related

Averaging a number of files from a folder

Copy pasting a excel column from one excel document to another

content from multiple txt files into single excel file using python

Creating new excel file with transformations from existing excel file (using python)

Developing a Counter for Outputting Data Info into a specific range of excel rows & columns?

Categories

Resources