how to loop through each row in excel spreadsheet using openpyxl - python

I would like to make the first column of each row in an excel spreadsheet as a key and rest of the values in that row as its value so that I can store them in a dictionary.
The problem is, when I loop through the rows and columns all the row values are getting stored in every key.
import openpyxl
from openpyxl import load_workbook
file = "test.xlsx"
#load the work book
wb_obj = load_workbook(filename = file)
wsheet = wb_obj['test']
#dictionary to store data
dataDict = {}
value = []
row_count = wsheet.max_row
col_count = wsheet.max_column
# loop to get row and column values
for i in range(2, row_count+1):
for j in range(i, col_count+1):
key = wsheet.cell(row=i, column=1).value
print (key)
value.append(wsheet.cell(row=i, column=j).value)
print (value)
dataDict[key] = value
#prompt user for input
userInput = input("Please enter an id to find a person's details: ")
print (dataDict.get(int(userInput)))
data in spreadsheet:
Result I'm expecting:
{1: ['John', 'Doe', 4567891234, 'johndoe#jd.ca'], 2: ['Wayne 'Kane', 1234567891, 'wd#wd.ca']}
Result I got:
{1: ['John', 'Doe', 4567891234, 'johndoe#jd.ca', 'Kane', 1234567891, 'wd#wd.ca'], 2: ['John', 'Doe', 4567891234, 'johndoe#jd.ca', 'Kane', 1234567891, 'wd#wd.ca']}

Openpyxl already has a proper way to iterate through rows using worksheet.iter_rows(). You can use it to unpack the first cell's value as the key and the values from the other cells as the list in the dictionary, repeating for every row.
from openpyxl import load_workbook
file = "test.xlsx" #load the work book
wb_obj = load_workbook(filename = file)
wsheet = wb_obj['test']
dataDict = {}
for key, *values in wsheet.iter_rows():
dataDict[key.value] = [v.value for v in values]

Related

Use openpyxl to populate columns with values from dictionary

I have a dictionary of filenames and corresponding values that I want to populate to a spreadsheet with openpyxl. In the code I've included a small example dict.
The filenames are already in Column A of Sheet1 but I'm struggling to populate the values to Column B in the corresponding rows. There isn't a logical order to the files so I've written a function to iterate over Column A and populate Column B when the required filename is found. I then run the dict through that function. At the moment it's returning a TypeError: 'str' object cannot be interpreted as an integer.
I'm thinking there's definitely a more straightforward way to do this...
from openpyxl import Workbook
import openpyxl
dict = {'file_a': 20, 'file_b': 30, 'file_c': 40}
file = 'spreadsheey.xlsx'
wb = openpyxl.load_workbook(file, read_only=True)
ws = wb.active
def populate_row(filename, length):
for row in ws.iter_rows('A'):
for cell in row:
if cell.value == filename:
ws.cell(row=cell.row, column=2).value = length
for key, value in dict.items():
populate_row(key, value)

XLSXwriter - writing multiple nested dictionaries

I need to write a multi-nested dictionary to an Excel file. The dictionary is structured as dict1:{dict2:{dict3:value}}}. My third write loop is raising a keyerror: '' even though there ought not be any blank keys.
I attempted to use this abhorrent monster as it has worked wonderfully for smaller dictionaries, but 1) there has to be a better way 2) I'm unable to scale it for this dictionary...
import xlsxwriter
workbook = xlsxwriter.Workbook('datatest.xlsx')
worksheet = workbook.add_worksheet('test1')
row = 0
col = 0
for key in sam_bps.keys():
row += 1
worksheet.write(row, col, key)
for key in sam_bps[sam].keys():
row, col = 0,1
worksheet.write(row,col,key)
row += 1
for key in sam_bps[sam][bp].keys():
row,col = 0,2
worksheet.write(row,col,key)
row += 1
for key in sam_bps[sam][bp][mpn].keys():
row,col = 0,3
worksheet.write(row,col,key)
row += 1
for item in sam_bps[sam][bp][mpn].keys():
row,col = 0,4
worksheet.write(row,col,item)
row += 1
workbook.close()
I've also considered converting the dictionary to a list of tuples or list of lists, but it doesn't output how I need. And it'll probably cost more time to split those open afterwards anyway.
Here's the code for the dicionary:
sam_bps = {}
sam_bps_header = ['SAM','BP','MPN','PLM_Rate']
for row in plmdata:
sam,mpn,bp,doc = row[24],row[5],row[7],row[2]
if sam == '':
sam = 'Unknown'
if doc == 'Requirement':
if sam not in sam_bps:
sam_bps[sam] = {bp:{mpn:heatscores[mpn]}}
elif bp not in sam_bps[sam]:
sam_bps[sam][bp] = {mpn:heatscores[mpn]}
elif mpn not in sam_bps[sam][bp]:
sam_bps[sam][bp][mpn] = heatscores[mpn]
print(sam_bps['Dan Reiser'])
EDIT: Added print statement to show output per feedback
{'Fortress Solutions': {'MSM5118160F60JS': 45}, 'Benchmark Electronics': {'LT1963AES8': 15}, 'Axxcss Wireless Solutions Inc': {'MGA62563TR1G': 405}}
I'd like to see this output to an Excel file, with the first column of course being the first key of sam_bps
Your question would be easier to answer if you provided an example of the dictionary you are trying to save.
Have you considered just serializing/deserializing the dictionary to JSON format?
you can save/re-load the file with minimal code:
import json
data = {'test': {'test2': {'test3':2}}}
with open('data.json', 'w') as outfile:
json.dump(data, outfile)
with open('data.json') as data_file:
data = json.load(data_file)

Creating Excel sheet using python, prog is not working

for key, value in fetched.keys():
ValueError: too many values to unpack
Here is the program
import xml.etree.cElementTree as etree
import xlsxwriter
import pprint
from csv import DictWriter
xmlDoc = open('C:/Users/Talha/Documents/abc.xml', 'r')
xmlDocData = xmlDoc.read()
xmlDocTree = etree.XML(xmlDocData)
sections = ['Srno','State','Statecd','District','IssuedOn','Day','normal_rainfall','normal_temp_max','normal_temp_min']
fetched = dict()
for sec in sections:
fetched[sec] = []
for item in xmlDocTree.iter( sec ):
fetched[sec].append( item.text )
#print fetched['State']
workbook = xlsxwriter.Workbook('data.xlsx')
worksheet = workbook.add_worksheet()
row = 0
col = 0
for key, value in fetched.keys():
worksheet.write(row, col, key)
worksheet.write(row, col + 1, value)
row += 1
workbook.close()
fetched dict contain data like this
fetched = {'Srno': ['1','2','3'....], 'State':['dads','dadada'.....],'District':['dadad','gdgdfg'......]}
Use .items() while traversing the dictionary with key and value at the same time.
Use
for key, value in fetched.items():
worksheet.write(row, col, key)
worksheet.write(row, col + 1, ','.join(value))
row += 1
Change keys to items.
You are trying to write a list to a cell. Convert the list to a string.
Use the following concept.
l = ['1','2','3']
print ','.join(l)
Convert the list to a string using a join with a separator.

iterate through previously filtered rows openpyxl

I have a python code written that loads an excel workbook, iterates through all of the rows in a specified column, saves the rows in a dictionary and writes that dictionary to a .txt file.
The vb script that is referenced opens the workbook before openpyxl does and filters it to only show some data.
The only problem is that when openpyxl iterates through the workbook, it records every value instead of the filtered data.
for example if the original spreadsheet is:
A B C
1 x x x
2 x y x
3 x x x
and I filter column B to only show rows that contain "x", then save the workbook. I want openpyxl to only iterate through rows 1 and 3.
here is my code:
from openpyxl import load_workbook
from openpyxl import workbook
import os
#sort using vba script
os.system(r"C:\script.vbs")
#load workbook
path = 'C:/public/temp/workbook.xlsm'
wb = load_workbook(filename = path)
ws=wb.get_sheet_by_name('Sheet3')
#make empty lists
proj_name = []
proj_num = []
proj_status = []
#iterate through rows and append values to lists
for row in ws.iter_rows('D{}:D{}'.format(ws.min_row,ws.max_row)):
for cell in row:
proj_name.append(cell.value)
for row in ws.iter_rows('R{}:R{}'.format(ws.min_row,ws.max_row)):
for cell in row:
proj_num.append(cell.value)
for row in ws.iter_rows('G{}:G{}'.format(ws.min_row,ws.max_row)):
for cell in row:
proj_status.append(cell.value)
#create dictionary from lists using defaultdict
from collections import defaultdict
dict1 = dict((z[0],list(z[1:])) for z in zip(proj_num,proj_name,proj_status))
with open(r"C:\public\list2.txt", "w") as text_file:
text_file.write(str(dict1))
text_file.close()
Unfortunately openpyxl does not currently include filtering in its functionality. As the documentation notes: "Filters and sorts can only be configured by openpyxl but will need to be applied in applications like Excel."
It looks as though you may have to find another solution ...
f is the data I want to filter: (e.g. 'CISCO' only with(and)'PAI' or 'BD' only with(and) 'PAP' or 'H' is 42 )
f = {
'C': ["CISCO", "BD"],
'E': ["PAI", "PAP"],
'H': [60]
}
from openpyxl import load_workbook
from openpyxl.utils.cell import column_index_from_string
def filter_data(rows, f_config, skip_header=False):
# convert column alphabet string to index number (e.g. A=1, B=2)
new_config = {}
for col, fil in f_config.items():
if type(col) == str:
col = column_index_from_string(col)
new_config[col] = fil
output = []
t_filter = len(new_config.items())
for n, row in enumerate(rows):
if n == 0:
if skip_header == True:
# first row header
continue
for i, (col,fil) in enumerate(new_config.items()):
if type(fil) != list:
fil = [fil]
val = row[col-1].value
# break the loop if any of the conditions not meet
if not val in fil:
break
if i+1 == t_filter:
# all conditions were met, add into output
output.append(row)
return output
#flexible to edit/filter which column of data you want
data1 = filter_data(sheet.rows, { "C": "CISCO", "E": "PAI" }, skip_header=True)
#filter 2 possibility, either str or value
data2 = filter_data(data1, { "H": [ "60", 60 ] } )

Create dictionary, only adding rows where one column matches a value in a list

I've got 2 CSV files.
First, I want to take 1 column and make a list.
Then I'd like to create a dictionary from another CSV, but only with rows where the value from one column matches a value already in the list created earlier on.
Here's the code so far:
#modified from: http://bit.ly/1iOS7Gu
import pandas
colnames = ['Gene > DB identifier', 'Gene_Symbol', 'Gene > Organism > Name', 'Gene > Homologues > Homologue > DB identifier', 'Homo_Symbol', 'Gene > Homologues > Homologue > Organism > Name', 'Gene > Homologues > Data', 'Sets > Name']
data = pandas.read_csv(raw_input("Enter csv file (including path)"), names=colnames)
filter = set(data.Homo_Symbol.values)
print set(data.Homo_Symbol.values)
#new_dict = raw_input("Enter Dictionary Name")
#source: http://bit.ly/1iOS0e3
import csv
new_dict = {}
with open('C:\Users\Chris\Desktop\gwascatalog.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
if row[0] in filter:
if row[0] in new_dict:
new_dict[row[0]].append(row[1:])
else:
new_dict[row[0]] = [row[1:]]
print new_dict
Here are the 2 sample data files: http://bit.ly/1hlpyTH
Any ideas? Thanks in advance.
You can use collections.defaultdict to get rid of check for list in dict:
from collections import defaultdict
new_dict = defaultdict(list)
#...
for row in reader:
if row[0] in filter:
new_dict[row[0]].append(row[1:])

Categories

Resources