openpyxl - change width of n columns - python

I am trying to change the column width for n number of columns.
I am able to do this for rows as per the below code.
rowheight = 2
while rowheight < 601:
ws.row_dimensions[rowheight].height = 4
rowheight += 1
The problem I have is that columns are in letters and not numbers.

As pointed out by ryachza the answer was to use an openpyxl utility, however the utility to use is get_column_letter and not column_index_from_string as I want to convert number to letter and not visa versa.
Here is the working code
from openpyxl.utils import get_column_letter
# Start changing width from column C onwards
column = 3
while column < 601:
i = get_column_letter(column)
ws.column_dimensions[i].width = 4
column += 1

To get the column index, you should be able to use:
i = openpyxl.utils.column_index_from_string(?)
And then:
ws.column_dimensions[i].width = ?

Related

python pandas: fulfill condition and assign a value to it

I am really hoping you can help me here...I need to assign a label(df_label) to an exact file within dataframe (df_data) and save all labels that appear in each file in a separate txt file (that's an easy bit)
df_data:
file_name file_start file_end
0 20190201_000004.wav 0.000 1196.000
1 20190201_002003.wav 1196.000 2392.992
2 20190201_004004.wav 2392.992 3588.992
3 20190201_010003.wav 3588.992 4785.984
4 20190201_012003.wav 4785.984 5982.976
df_label:
Begin Time (s)
0 27467.100000
1 43830.400000
2 43830.800000
3 46378.200000
I have tried to switch to np.array and use for loop and np.where but without any success...
If the time values in df_label fall under exactly one entry in df_data, you can use the following
def get_file_name(begin_time):
file_names = df_data[
(df_data["file_start"] <= begin_time)
& (df_data["file_end"] >= begin_time)
]["file_name"].values
return file_names.values[0] if file_names.values.size > 0 else None
df_label["file_name"] = df_label["Begin Time (s)"].apply(get_label)
This will add another col file_name to df_label
If the labels from df_label matches the order of files in df_data you can simply:
add the labels as new column of df_data (df_data["label"] = df_label["Begin Time (s)"]).
or
use DataFrame.merge() function (df_data = df_data.merge(df_labels, left_index=True, right_index=True)).
More about merging/joining with examples you can find here:
https://thispointer.com/pandas-how-to-merge-dataframes-by-index-using-dataframe-merge-part-3/
https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

for loop for openpyxl multiple chart creation

I'm trying to create a for loop to create multiple line charts in openpyxl, all at once. Certain indices in an array would be the bookends for the data the chart would draw data from. Is this possible in openpyxl?
My data in the excel spreadsheet looks like this:
1 Time Battery Voltage
2 2019-06-05 00:00:00 45
3 2019-06-05 00:01:50 49
4 2019-06-05 00:02:30 51
5 2019-06-05 00:04:58 34
...
import os
import openpyxl
from openpyxl import Workbook
from openpyxl.chart import LineChart, Reference, Series
from openpyxl.chart.axis import DateAxis
from datetime import date, datetime, timedelta, time
os.chdir('C:\\Users\user\test')
wb = openpyxl.load_workbook('log.xlsx')
sheet = wb['sheet2']
ws2 = wb['sheet2']
graphIntervals = [0,50,51,100,101,150] # filled with tuples of two integers,
# representing the top-left and bottom right of the rectangular
# selection of cells containing chart data I'm trying to graph
starts = graphIntervals[::2]
ends = graphIntervals[1::2]
for i in graphIntervals:
c[i] = LineChart()
c[i].title = "Chart Title"
c[i].style = 12
c[i].y_axis.crossAx = 500
c[i].x_axis = DateAxis(crossAx=100)
c[i].x_axis.number_format = 'd-HH-MM-SS'
c[i].x_axis.majorTimeUnit = "days"
c[i].y_axis.title = "Battery Voltage"
c[i].x_axis.title = "Time"
data = Reference(ws2, min_col=2, min_row=starts, max_col=2, max_row=ends)
c[i].add_data(data, titles_from_data=True)
dates = Reference(ws2, min_col=1, min_row=starts, max_row=ends)
c[i].set_categories(dates)
s[i] = c[i].series[0]
s[i].graphicalProperties.line.solidFill = "BE4B48"
s[i].graphicalProperties.line.width = 25000 # width in EMUs.
s[i].smooth = True # Make the line smooth
ws2.add_chart(c[i], "C[i+15]") # +15 for spacing
wb.save('log.xlsx')
Ideally I would end up making (however many values are in graphIntervals/2) charts.
I know I need to incorporate zip() in my data variable, otherwise it has no way to move on to the next set of values to create spreadsheets from. I think it would something like zip(starts, ends) but I'm not sure.
Is any of this possible through openpyxl? Although I haven't found any, does anyone have examples I could reference?
Followed advice in the comments. Here's that function called in a for loop:
for i in range(0, len(graphIntervals), 2):
min_row = graphIntervals[i] + 1
max_row = graphIntervals[i+1] + 1
# skip headers on first row
if min_row == 1:
min_row = 2
dates = chart.Reference(ws2, min_col=1, min_row=min_row, max_row=max_row)
vBat = chart.Reference(ws2, min_col=2, min_row=min_row, max_col=2, max_row=max_row)
qBat = chart.Reference(ws2, min_col=3, min_row=min_row, max_col=3, max_row=max_row)

How to detect "strikethrough" style from xlsx file in R

I have to check the data which contain "strikethrough" format when importing excel file in R
Do we have any method to detect them ?
Welcome for both R and Python approach
R-solution
the tidyxl-package can help you...
example temp.xlsx, with data on A1:A4 of the first sheet. Below is an excel-screenshot:
library(tidyxl)
formats <- xlsx_formats( "temp.xlsx" )
cells <- xlsx_cells( "temp.xlsx" )
strike <- which( formats$local$font$strike )
cells[ cells$local_format_id %in% strike, 2 ]
# A tibble: 2 x 1
# address
# <chr>
# 1 A2
# 2 A4
I present below a small sample program that filters out text with strikethrough applied, using the openpyxl package (I tested it on version 2.5.6 with Python 3.7.0). Sorry it took so long to get back to you.
import openpyxl as opx
from openpyxl.styles import Font
def ignore_strikethrough(cell):
if cell.font.strike:
return False
else:
return True
wb = opx.load_workbook('test.xlsx')
ws = wb.active
colA = ws['A']
fColA = filter(ignore_strikethrough, colA)
for i in fColA:
print("Cell {0}{1} has value {2}".format(i.column, i.row, i.value))
print(i.col_idx)
I tested it on a new workbook with the default worksheets, with the letters a,b,c,d,e in the first five rows of column A, where I had applied strikethrough formatting to b and d. This program filters out the cells in columnA which have had strikethrough applied to the font, and then prints the cell, row and values of the remaining ones. The col_idx property returns the 1-based numeric column value.
I found a method below:
'# Assuming the column from 1 - 10 has value : A , the 5th A contains "strikethrough"
TEST_wb = load_workbook(filename = 'TEST.xlsx')
TEST_wb_s = TEST_wb.active
for i in range(1, TEST_wb_s.max_row+1):
ck_range_A = TEST_wb_s['A'+str(i)]
if ck_range_A.font.strikethrough == True:
print('YES')
else:
print('NO')
But it doesn't tell the location (this case is the row numbers),which is hard for knowing where contains "strikethrough" when there is a lot of result , how can i vectorize the result of statement ?

Two different excel file to match their rows having same name

Using python pandas,
I am trying to write a condition in pandas which will match two columns from two different excel file having the same column name and different numerical values in them. For each column there are 2000 rows to match.
The condition:
if final value = ( if File1(column1value) - File2(column1value) = 0 then update the value with 1;
if File1(column1value) - File2(column1value) is less than or equ al to 0.2 then keep File1Column1Value;
if (File1Column1) - File2(column1value) greater than 0.2 the. update the value with 0.
https://i.stack.imgur.com/Nx3WA.jpg
df1 = pd.read_excel('file_name1') # get input from excel files
df2 = pd.read_excel('file_name2')
p1 = df1['p1'].values
p11 = df2['p11'].values
new_col = [] # we will store desired values here
for i in range(len(p1)):
if p1[i] - p11[i] == 0:
new_col.append(1)
elif abs(p1[i] - p11[i]) > 0.2:
new_col.append(0)
else:
new_col.append(p1[i])
df1['new_column'] = new_col # we add new column with our values
You can also remove old column df.drop('column', axis = 1)

How to generalize this calculation with a pandas DataFrame to any number of columns?

I have a file with some data that looks like
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
I can process this data and do math on it just fine:
import sys
import numpy as np
import pandas as pd
def main():
if(len(sys.argv) != 2):
print "Takes one filename as argument"
sys.exit()
file_name = sys.argv[1]
data = pd.read_csv(file_name, sep=" ", header=None)
data.columns = ["timestep", "mux", "muy", "muz"]
t = data["timestep"].count()
c = np.zeros(t)
for i in range(0,t):
for j in range(0,i+1):
c[i-j] += data["mux"][i-j] * data["mux"][i]
c[i-j] += data["muy"][i-j] * data["muy"][i]
c[i-j] += data["muz"][i-j] * data["muz"][i]
for i in range(t):
print c[i]/(t-i)
The expected result for my sample input above is
42.5
62.0
84.5
110.0
This math is finding the time correlation function for my data, which is the time-average of all permutations of the pairs of products in each column.
I would like to generalize this program to
work on n number of columns (in the i/j loop for example), and
be able to read in the column names from the file, so as to not have them hard-coded in
Which numpy or pandas methods can I use to accomplish this?
We can reduce it to one loop, as we would make use of array-slicing and use sum ufunc to operate along the rows of the dataframe and thus in the process make it generic to cover any number of columns, like so -
a = data.values
t = data["timestep"].count()
c = np.zeros(t)
for i in range(t):
c[:i+1] += (a[:i+1,1:]*a[i,1:]).sum(axis=1)
Explanation
1) a[:i+1,1:] is the slice of all rows until the i+1-th row and all columns starting from the second column, i.e mux, muy and so on.
2) Similarly, for [i,1:], that's the i-th row and all columns from second column onwards.
To keep it "pandas-way", simply replace a[ with data.iloc[.

Categories

Resources