Creating Excel sheet using python, prog is not working - python

for key, value in fetched.keys():
ValueError: too many values to unpack
Here is the program
import xml.etree.cElementTree as etree
import xlsxwriter
import pprint
from csv import DictWriter
xmlDoc = open('C:/Users/Talha/Documents/abc.xml', 'r')
xmlDocData = xmlDoc.read()
xmlDocTree = etree.XML(xmlDocData)
sections = ['Srno','State','Statecd','District','IssuedOn','Day','normal_rainfall','normal_temp_max','normal_temp_min']
fetched = dict()
for sec in sections:
fetched[sec] = []
for item in xmlDocTree.iter( sec ):
fetched[sec].append( item.text )
#print fetched['State']
workbook = xlsxwriter.Workbook('data.xlsx')
worksheet = workbook.add_worksheet()
row = 0
col = 0
for key, value in fetched.keys():
worksheet.write(row, col, key)
worksheet.write(row, col + 1, value)
row += 1
workbook.close()
fetched dict contain data like this
fetched = {'Srno': ['1','2','3'....], 'State':['dads','dadada'.....],'District':['dadad','gdgdfg'......]}

Use .items() while traversing the dictionary with key and value at the same time.
Use
for key, value in fetched.items():
worksheet.write(row, col, key)
worksheet.write(row, col + 1, ','.join(value))
row += 1
Change keys to items.
You are trying to write a list to a cell. Convert the list to a string.
Use the following concept.
l = ['1','2','3']
print ','.join(l)
Convert the list to a string using a join with a separator.

Related

XLSXwriter - writing multiple nested dictionaries

I need to write a multi-nested dictionary to an Excel file. The dictionary is structured as dict1:{dict2:{dict3:value}}}. My third write loop is raising a keyerror: '' even though there ought not be any blank keys.
I attempted to use this abhorrent monster as it has worked wonderfully for smaller dictionaries, but 1) there has to be a better way 2) I'm unable to scale it for this dictionary...
import xlsxwriter
workbook = xlsxwriter.Workbook('datatest.xlsx')
worksheet = workbook.add_worksheet('test1')
row = 0
col = 0
for key in sam_bps.keys():
row += 1
worksheet.write(row, col, key)
for key in sam_bps[sam].keys():
row, col = 0,1
worksheet.write(row,col,key)
row += 1
for key in sam_bps[sam][bp].keys():
row,col = 0,2
worksheet.write(row,col,key)
row += 1
for key in sam_bps[sam][bp][mpn].keys():
row,col = 0,3
worksheet.write(row,col,key)
row += 1
for item in sam_bps[sam][bp][mpn].keys():
row,col = 0,4
worksheet.write(row,col,item)
row += 1
workbook.close()
I've also considered converting the dictionary to a list of tuples or list of lists, but it doesn't output how I need. And it'll probably cost more time to split those open afterwards anyway.
Here's the code for the dicionary:
sam_bps = {}
sam_bps_header = ['SAM','BP','MPN','PLM_Rate']
for row in plmdata:
sam,mpn,bp,doc = row[24],row[5],row[7],row[2]
if sam == '':
sam = 'Unknown'
if doc == 'Requirement':
if sam not in sam_bps:
sam_bps[sam] = {bp:{mpn:heatscores[mpn]}}
elif bp not in sam_bps[sam]:
sam_bps[sam][bp] = {mpn:heatscores[mpn]}
elif mpn not in sam_bps[sam][bp]:
sam_bps[sam][bp][mpn] = heatscores[mpn]
print(sam_bps['Dan Reiser'])
EDIT: Added print statement to show output per feedback
{'Fortress Solutions': {'MSM5118160F60JS': 45}, 'Benchmark Electronics': {'LT1963AES8': 15}, 'Axxcss Wireless Solutions Inc': {'MGA62563TR1G': 405}}
I'd like to see this output to an Excel file, with the first column of course being the first key of sam_bps
Your question would be easier to answer if you provided an example of the dictionary you are trying to save.
Have you considered just serializing/deserializing the dictionary to JSON format?
you can save/re-load the file with minimal code:
import json
data = {'test': {'test2': {'test3':2}}}
with open('data.json', 'w') as outfile:
json.dump(data, outfile)
with open('data.json') as data_file:
data = json.load(data_file)

how to loop through each row in excel spreadsheet using openpyxl

I would like to make the first column of each row in an excel spreadsheet as a key and rest of the values in that row as its value so that I can store them in a dictionary.
The problem is, when I loop through the rows and columns all the row values are getting stored in every key.
import openpyxl
from openpyxl import load_workbook
file = "test.xlsx"
#load the work book
wb_obj = load_workbook(filename = file)
wsheet = wb_obj['test']
#dictionary to store data
dataDict = {}
value = []
row_count = wsheet.max_row
col_count = wsheet.max_column
# loop to get row and column values
for i in range(2, row_count+1):
for j in range(i, col_count+1):
key = wsheet.cell(row=i, column=1).value
print (key)
value.append(wsheet.cell(row=i, column=j).value)
print (value)
dataDict[key] = value
#prompt user for input
userInput = input("Please enter an id to find a person's details: ")
print (dataDict.get(int(userInput)))
data in spreadsheet:
Result I'm expecting:
{1: ['John', 'Doe', 4567891234, 'johndoe#jd.ca'], 2: ['Wayne 'Kane', 1234567891, 'wd#wd.ca']}
Result I got:
{1: ['John', 'Doe', 4567891234, 'johndoe#jd.ca', 'Kane', 1234567891, 'wd#wd.ca'], 2: ['John', 'Doe', 4567891234, 'johndoe#jd.ca', 'Kane', 1234567891, 'wd#wd.ca']}
Openpyxl already has a proper way to iterate through rows using worksheet.iter_rows(). You can use it to unpack the first cell's value as the key and the values from the other cells as the list in the dictionary, repeating for every row.
from openpyxl import load_workbook
file = "test.xlsx" #load the work book
wb_obj = load_workbook(filename = file)
wsheet = wb_obj['test']
dataDict = {}
for key, *values in wsheet.iter_rows():
dataDict[key.value] = [v.value for v in values]

Split a row into multiple cells and keep the maximum value of second value for each gene

I am new to Python and I prepared a script that will modify the following csv file
accordingly:
1) Each row that contains multiple Gene entries separated by the /// such as:
C16orf52 /// LOC102725138 1.00551
should be transformed to:
C16orf52 1.00551
LOC102725138 1.00551
2) The same gene may have different ratio values
AASDHPPT 0.860705
AASDHPPT 0.983691
and we want to keep only the pair with the highest ratio value (delete the pair AASDHPPT 0.860705)
Here is the script I wrote but it does not assign the correct ratio values to the genes:
import csv
import pandas as pd
with open('2column.csv','rb') as f:
reader = csv.reader(f)
a = list(reader)
gene = []
ratio = []
for t in range(len(a)):
if '///' in a[t][0]:
s = a[t][0].split('///')
gene.append(s[0])
gene.append(s[1])
ratio.append(a[t][1])
ratio.append(a[t][1])
else:
gene.append(a[t][0])
ratio.append(a[t][1])
gene[t] = gene[t].strip()
newgene = []
newratio = []
for i in range(len(gene)):
g = gene[i]
r = ratio[i]
if g not in newgene:
newgene.append(g)
for j in range(i+1,len(gene)):
if g==gene[j]:
if ratio[j]>r:
r = ratio[j]
newratio.append(r)
for i in range(len(newgene)):
print newgene[i] + '\t' + newratio[i]
if len(newgene) > len(set(newgene)):
print 'missionfailed'
Thank you very much for any help or suggestion.
Try this:
with open('2column.csv') as f:
lines = f.read().splitlines()
new_lines = {}
for line in lines:
cols = line.split(',')
for part in cols[0].split('///'):
part = part.strip()
if not part in new_lines:
new_lines[part] = cols[1]
else:
if float(cols[1]) > float(new_lines[part]):
new_lines[part] = cols[1]
import csv
with open('clean_2column.csv', 'wb') as csvfile:
writer = csv.writer(csvfile, delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
for k, v in new_lines.items():
writer.writerow([k, v])
First of all, if you're importing Pandas, know that you have I/O Tools to read CSV files.
So first, let's import it that way :
df = pd.read_csv('2column.csv')
Then, you can extract the indexes where you have your '///' pattern:
l = list(df[df['Gene Symbol'].str.contains('///')].index)
Then, you can create your new rows :
for i in l :
for sub in df['Gene Symbol'][i].split('///') :
df=df.append(pd.DataFrame([[sub, df['Ratio(ifna vs. ctrl)'][i]]], columns = df.columns))
Then, drop the old ones :
df=df.drop(df.index[l])
Then, I'll do a little trick to remove your lowest duplicate values. First, I'll sort them by 'Ratio (ifna vs. ctrl)' then I'll drop all the duplicates but the first one :
df = df.sort('Ratio(ifna vs. ctrl)', ascending=False).drop_duplicates('Gene Symbol', keep='first')
If you want to keep your sorting by Gene Symbol and reset indexes to have simpler ones, simply do :
df = df.sort('Gene Symbol').reset_index(drop=True)
If you want to re-export your modified data to your csv, do :
df.to_csv('2column.csv')
EDIT : I edited my answer to correct syntax errors, I've tested this solution with your csv and it worked perfectly :)
This should work.
It uses the dictionary suggestion of Peter.
import csv
with open('2column.csv','r') as f:
reader = csv.reader(f)
original_file = list(reader)
# gets rid of the header
original_file = original_file[1:]
# create an empty dictionary
genes_ratio = {}
# loop over every row in the original file
for row in original_file:
gene_name = row[0]
gene_ratio = row[1]
# check if /// is in the string if so split the string
if '///' in gene_name:
gene_names = gene_name.split('///')
# loop over all the resulting compontents
for gene in gene_names:
# check if the component is in the dictionary
# if not in dictionary set value to gene_ratio
if gene not in genes_ratio:
genes_ratio[gene] = gene_ratio
# if in dictionary compare value in dictionary to gene_ratio
# if dictionary value is smaller overwrite value
elif genes_ratio[gene] < gene_ratio:
genes_ratio[gene] = gene_ratio
else:
if gene_name not in genes_ratio:
genes_ratio[gene_name] = gene_ratio
elif genes_ratio[gene_name] < gene_ratio:
genes_ratio[gene_name] = gene_ratio
#loop over dictionary and print gene names and their ratio values
for key in genes_ratio:
print key, genes_ratio[key]

python how to port from xlsxwriter to xlwt

I want to recreate an xlsxwriter program in xlwt.
I have issues writing a row. Can someone help me with the xlwt module? I found alot of code with xlwt using enumerate, but I am not too familiar with xlwt. The problem I have is, xlwt is writing the whole list as a string in the first cell, so I end up with one column full of data. The xlsxwriter writes each item in the list in its separate cell, which is what I want to do with xlwt. If someone can guide me in right direction, it will be greatly appreciated. thanks
this is my code:
def xlsxwriter_res(result):
workbook = xlsxwriter.Workbook('filename.xlsx')
for key,value in result.items():
worksheet = workbook.add_worksheet(key)
row, col = 0, 0
for line in value:
worksheet.write_row(row, col, line) ### Writes each item in list in separate cell
row += 1
workbook.close()
def xlwt_res(result):
workbook = xlwt.Workbook(encoding="utf-8")
for key,value in result.items():
worksheet = workbook.add_sheet(key)
row, col = 0, 0
for line in value:
worksheet.write(row, col, line) ### Writes the whole list as string in one cell
row += 1
workbook.save('filename.xls')
Try that:
import xlwt
def xlwt_res(result):
workbook = xlwt.Workbook(encoding="utf-8")
for key, value in result.items():
worksheet = workbook.add_sheet(key)
row = 0 # we assign 'col' later instead
for line in value:
# we're going to iterate over the line object
# and write directly to a cell, incrementing the column id
for col, cell in enumerate(line):
worksheet.write(row, col, cell) # writes the list contents just like xlsxwriter.write_row!
row += 1
workbook.save('filename.xls')
xlwt_res({'one': ["just one element"], 'two': ["that's a list", "did you know it"], 'three': ["let's", "have", "3"]})
So both xlwt and xlsxwriter yield the same results:

Comparing 2 excel files via Python. Is there any other recommended language to use instead of python?

reference file:
fill_in:
basically, the you're taking the values in col 1 (left) and comparing them with the values in the reference file (col1). If the values are an exact match, it will take the value in col2 from reference and place it into col2 of the fill_in file. (below)
So far, my codes is this :
import win32com.client, csv, os, string
# Office 2010 - Microsoft Office Object 14.0 Object Library
from win32com.client import gencache
gencache.EnsureModule('{2DF8D04C-5BFA-101B-BDE5-00AA0044DE52}', 0, 2, 5)
#
# Office 2010 - Excel COM
from win32com.client import gencache
gencache.EnsureModule('{00020813-0000-0000-C000-000000000046}', 0, 1, 7)
#
Application = win32com.client.Dispatch("Excel.Application")
Application.Visible = True
Workbook = Application.Workbooks.Add()
Sheet = Application.ActiveSheet
#
#REFERENCE FILE
f = open("reference_file.csv", "rb")
ref = csv.reader(f)
ref_dict = dict()
#FILE WITH BLANKS
g = open("fill_in.csv", "rb")
fill = csv.reader(g)
fill_dict = dict()
#CODE STARTS
gene_dic = dict()
count = 0
#Make reference file into a dictionary
for line in ref:
ref_dict[line[1]] = [line[0]]
#Make Fill in file into a dictionary
for i in fill:
fill_dict[i[1]] = [i[0]]
#finding difference in both dictionaries
diff = {}
for key in ref_dict.keys():
if(not fill_dict.has_key(key)):
diff[key] = (ref_dict[key])
elif(ref_dict[key] != fill_dict[key]):
diff[key] = (ref_dict[key], fill_dict[key])
for key in fill_dict.keys():
if(not ref_dict.has_key(key)):
diff[key] = (fill_dict[key])
fill_dict.update(diff)
print fill_dict
#Put dictionary into an Array
temp = []
dictlist = []
for key, value in fill_dict.iteritems():
temp = [key, value]
dictlist.append(temp)
dictlist.sort()
print(dictlist)
for i in dictlist:
count += 1
Sheet.Range("A" + str(count)).Value = i[0]
Sheet.Range("B" + str(count)).Value = i[1]
Workbook.SaveAs(os.getcwd() + "/" + "result1.csv")
The results is this:
But the supposed result was suppose to be like this:
If in column 2(column B), there is a value, it should remain untouched. If there's an empty cell, and it has a match in the reference file, it would print the number into columnB
I've also tried this code, however i've only manage to put it in a list, not in excel :
r=open("reference_file.csv","rb")
ref = csv.reader(r)
ref_dict = dict()
f=open("fill_in.csv", "rb")
fill = csv.reader(f)
#CODE STARTS
lst = []
lstkey = []
count = 0
#put reference file in a dictionary
for line in ref:
ref_dict[line[1]] = [line[0]]
all_values = defaultdict(list)
for key in ref_dict:
for value in (map(lambda x: x.strip(), ref_dict[key][0].split(","))):
all_values[value].append(key)
for i in lst:
lstkey.append(all_values[i])
print lstkey
I dont know if there is any specific language to use when operating with excel files, but for sure you can use ruby. I personally find ruby codes easier to understand and would use ruby for a task like this. You can check out this topic where they parse an excel file and do some checks. Hope it helps.
Couple of thoughts:
If you're thinking about using non-Python solutions, Have you tried
VBA (Visual Basic For Applications)?
If you stick with Python, take a look at John
Machin's outstanding xlrd (Excel Read) and xlwt (Excel Write) tools.
You can find them on http://www.python-excel.org. With a little playing around, you should be able to
apply the results of the list you generated to a new spreadsheet or
workbook using xlwt.

Categories

Resources