openpyxl: find highest row for each column

openpyxl: find highest row for each column - python

I'm new to python programming. I've written a script to get data from an api (using python 2.7.8), and now I'd like to add it to an excel spreadsheet where I keep all my data.
In my spreadsheet, each row is one day, but some of the data doesn't become available until up to 30 days later, so some of my columns are not full all the way to the current date. Basically, not all my column lengths are the same.
I'd like to read each column, find the highest row for that column, and then add my data points to the end of that column. If all columns were the same length, this would be simple, but I don't understand how to find the length of each column separately.
I've read through the docs for openpyxl, but I'm new to python and I don't really understand everything. I think the solution will involve something like 'for each column, get the highest row', and then I would append each data point to that column. but I don't understand how to do the 'for each column' part. Finding the length of each column would also work.
thanks in advance
Edit: I came up with a work around: I know the relative length of the columns so I subtracted that from the number for the last row:
last_row = ws.get_highest_row() + 1
col_num = 1
dataRow_length = len(dataRow)
row_offset = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 7, 14, 28, 1, 2, 3, 7, 14, 28]
for i in range(0, dataRow_length - 1):
ws.cell(row=(last_row - row_offset[col_num - 1]), column=col_num).value = dataRow[i]
col_num = col_num + 1

If you iterate over the rows of a worksheet you can always find the length of a row. That should be sufficient for your purposes. If not, please supply some of your code so it's clearer as to what exactly you want to do.

Related

How can I improve performance of Sudoku solver?

I can't improve the performance of the following Sudoku Solver code. I know there are 3 loops here and they probably cause slow performance but I can't find a better/more efficient way. "board" is mutated with every iteration of recursion - if there are no zeros left, I just need to exit the recursion.
I tried to isolate "board" from mutation but it hasn't changed the performance. I also tried to use list comprehension for the top 2 "for" loops (i.e. only loop through rows and columns with zeros), tried to find coordinates of all zeros, and then use a single loop to go through them - hasn't helped.
I think I'm doing something fundamentally wrong here with recursion - any advice or recommendation on how to make the solution faster?
def box(board,row,column):
start_col = column - (column % 3)
finish_col = start_col + 3
start_row = row - (row % 3)
finish_row = start_row + 3
return [y for x in board[start_row:finish_row] for y in x[start_col:finish_col]]
def possible_values(board,row,column):
values = {1,2,3,4,5,6,7,8,9}
col_values = [v[column] for v in board]
row_values = board[row]
box_values = box(board, row, column)
return (values - set(row_values + col_values + box_values))
def solve(board, i_row = 0, i_col = 0):
for rn in range(i_row,len(board)):
if rn != i_row: i_col = 0
for cn in range(i_col,len(board)):
if board[rn][cn] == 0:
options = possible_values(board, rn, cn)
for board[rn][cn] in options:
if solve(board, rn, cn):
return board
board[rn][cn] = 0
#if no options left for the cell, go to previous cell and try next option
return False
#if no zeros left on the board, problem is solved
return True
problem = [
[9, 0, 0, 0, 8, 0, 0, 0, 1],
[0, 0, 0, 4, 0, 6, 0, 0, 0],
[0, 0, 5, 0, 7, 0, 3, 0, 0],
[0, 6, 0, 0, 0, 0, 0, 4, 0],
[4, 0, 1, 0, 6, 0, 5, 0, 8],
[0, 9, 0, 0, 0, 0, 0, 2, 0],
[0, 0, 7, 0, 3, 0, 2, 0, 0],
[0, 0, 0, 7, 0, 5, 0, 0, 0],
[1, 0, 0, 0, 4, 0, 0, 0, 7]
]
solve(problem)

Three things you can do to speed this up:
Maintain additional state using arrays of integers to keep track of row, col, and box candidates (or equivalently values already used) so that finding possible values is just possible_values = row_candidates[row] & col_candidates[col] & box_candidates[box]. This is a constant factors thing that will change very little in your approach.
As kosciej16 suggested use the min-remaining-values heuristic for selecting which cell to fill next. This will turn your algorithm into crypto-DPLL, giving you early conflict detection (cells with 0 candiates), constraint propagation (cells with 1 candidate), and a lower effective branching factor for the rest.
Add logic to detect hidden singles (like the Norvig solver does). This will make your solver a little slower for the simplest puzzles, but it will make a huge difference for puzzles where hidden singles are important (like 17 clue puzzles).

A result that worked at the end thanks to 53x15 and kosciej16. Not ideal or most optimal but passes the required performance test:
def solve(board, i_row = 0, i_col = 0):
cells_to_solve = [((rn, cn), possible_values(board,rn,cn)) for rn in range(len(board)) for cn in range(len(board)) if board[rn][cn] == 0]
if not cells_to_solve: return True
min_n_of_values = min([len(x[1]) for x in cells_to_solve])
if min_n_of_values == 0: return False
best_cells_to_try = [((rn,cn),options) for ((rn,cn),options) in cells_to_solve if len(options) == min_n_of_values]
for ((rn,cn),options) in best_cells_to_try:
for board[rn][cn] in options:
if solve(board, rn, cn):
return board
board[rn][cn] = 0
return False

How to call pandas columns with numeric suffixes in a for loop then apply conditions based on other columns with numeric suffixes (python)?

In python I am trying to update pandas dataframe column values based on the condition of another column value. Each of the column names have numeric suffixes that relate them. Here is a dataframe example:
Nodes_2 = pd.DataFrame([[0, 0, 37.76, 0, 0, 1, 28.32], [0, 0, 45.59, 0, 0, 1, 34.19], [22.68, 0, 22.68, 1, 0, 1, 34.02], [0, 0, 41.03, 0, 0, 1, 30.77], [20.25, 0, 20.25, 1, 0, 1, 30.37]], columns=['ait1', 'ait2', 'ait3', 'Type1', 'Type2', 'Type3', 'Flow'])
And the relevant 'Type' list:
TypeNums = [1, 2, 3, 4, 5, 6, 7, 8]
Specifically, I am trying to update values in the 'ait' columns with 'Flow' values if the 'Type' value equals 1; if the 'Type' value equals 0, the 'ait' value should be 0.
My attempt at applying these conditions is not working as it is getting hung up on how I am trying to reference the columns using the string formatting. See below:
for num in TypeNums:
if Nodes_2['Type{}'.format(num)] == 1:
Nodes_2['ait{}'.format(num)] == Nodes_2['Flow']
elif Nodes_2['Type{}'.format(num)] == 0:
Nodes_2['ait{}'.format(num)] == 0
That said, how should I call the columns with their numeric suffixes without typing duplicative code calling each name? And is this a correct way of applying the above mentioned conditions?
Thank you!

The correct way is to use np.where or, in your case, just simple multilication:
for num in TypeNums:
Nodes_2['ait{}'.format(num)] = Nodes_2['Type{}'.format(num)] * Nodes_2['Flow']
Or, you can multiply all the columns at once:
Nodes_2[['ait{}'.format(num) for num in TypeNums]] = Nodes_2[['Type{}'.format(num) for num in TypeNums]].mul(Nodes_2['Flow'], axis='rows').to_numpy()

iterating over non-interpolated Array of symbols python equivalent

I am currently writing script in python from a ruby module. I am having trouble with this aspect of the translation of ruby to python.
Ruby:
plan_metrics[test_name]={ passed_count: 0, blocked_count: 0, untested_count: 0, failed_count: 0, reviewed_count: 0, test_harness_issue_count: 0, bug_failure_count: 0, defect_list: [] }
entry['runs'].each do |run|
metric_hash = plan_metrics[test_name]
%i[passed_count blocked_count untested_count failed_count].each do |key|
metric_hash[key] = metric_hash[key] + run[key.to_s]
end
In this code, entry['runs'] holds the actual values of passed_count, blocked_count, untested_count, and failed_count, but in multiple dictionaries. This is supposed to iterate over them and add up all the values and put them into ONE symbol (i.e passed_count) that is held in metric_hash
Now when i try to translate into python, i am not using symbols but instead doing it like this
My Python translation:
plan_metrics[test_name]={ "passed_count": 0, "blocked_count": 0, "untested_count": 0, "failed_count": 0, "reviewed_count": 0, "test_harness_issue_count": 0, "bug_failure_count": 0, "defect_list": [] }
for run in entry["runs"]:
metric_hash = plan_metrics[test_name]
for key in [metric_hash["passed_count"], metric_hash["blocked_count"], metric_hash["untested_count"], metric_hash["failed_count"]:
metric_hash[key] = metric_hash[key] + run[str(key)]
But for this i am getting KeyError: 0 on line metric_hash[key] = metric_hash[key] + run[str(key)]
would
for key in [metric_hash["passed_count"], metric_hash["blocked_count"], metric_hash["untested_count"], metric_hash["failed_count"]:
be the proper equivalent of
%i[passed_count blocked_count untested_count failed_count].each do |key|
and if so what is causing the KeyError: 0?
if not how can i accomplish what the ruby example did, with interating over array of symbols, in python
If you need more information on the data, letme know what to print() thanks

In python you do
for key in [metric_hash["passed_count"], metric_hash["blocked_count"], metric_hash["untested_count"], metric_hash["failed_count"]:
That means that key takes values from a list [0, 0, 0, 0]. Do you see why?

Divide integer by a list to create a new list

I've created a list of number in a specified range. I now want to divide an value by each element in the list, and then add that new value to a new list.
Heres what I've got:
Y = []
value = 55 #can be any value of my choosing
newx = list(range(50,500,10))
newy = value/(newx)**2
Y.append(newy)
I keep getting TypeError with unsupported operand types for ** or pow(): list and int and I don't know why. NOTE: The ** is a syntax for power i.e 1/(x^2)

One "clean" option to do it is to use numpy array:
import numpy as np
value = 55 #can be any value of my choosing
Y = np.arange(50,500,10)
Y = value/(Y)**2
You got an error since in python you cannot take a square of a list (and you also cannot devide a number by a list). numpy array allows you to take a square and to do this division and many other mathematical operations.

Your description says what you want to do: divide a value by each element in a list. But that's not what you're actually doing, which is trying to divide the value by the list itself. You should do what you say you want to:
Y = [value/v for v in newx]
(I don't understand what the ** is for, you don't mention that anywhere.)

You can just use a list comprehension :
newy = [value/x**2 for x in newx]
The error you get is because the square of a list isn't defined.
The square of a numpy.array is defined though, and would be a new array with the square of each element from the original array.
Depending on the value and range you're working with, you might want to convert the int to float first. You could get 0s otherwise :
>>> value = 55
>>> newx = range(50, 500, 10)
>>> [value/x**2 for x in newx]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
But :
>>> [value/float(x)**2 for x in newx]
[0.022, 0.015277777777777777, 0.011224489795918367, 0.00859375, 0.006790123456790123, 0.0055, 0.004545454545454545, 0.0038194444444444443, 0.003254437869822485, 0.0028061224489795917, 0.0024444444444444444, 0.0021484375, 0.0019031141868512112, 0.0016975308641975309, 0.0015235457063711912, 0.001375, 0.0012471655328798186, 0.0011363636363636363, 0.0010396975425330812, 0.0009548611111111111, 0.00088, 0.0008136094674556213, 0.0007544581618655693, 0.0007015306122448979, 0.0006539833531510107, 0.0006111111111111111, 0.0005723204994797086, 0.000537109375, 0.000505050505050505, 0.0004757785467128028, 0.0004489795918367347, 0.0004243827160493827, 0.00040175310445580715, 0.0003808864265927978, 0.0003616042077580539, 0.00034375, 0.0003271861986912552, 0.00031179138321995464, 0.00029745808545159546, 0.0002840909090909091, 0.00027160493827160494, 0.0002599243856332703, 0.00024898143956541424, 0.00023871527777777777, 0.00022907122032486465]

Fill missing values in lists

I have a list which consists of 0's and 1's. The list should ideally look like this 0,1,0,1,0,1,0,1,0,1,0,1,0,1.....
But due to some error in logging, my list looks like this: 0,1,0,1,1,1,0,1,0,0,0,1,0,1.... As one can clearly there are some missed 0's and 1's in middle. How can I fix this list to add those 0's and 1's in between the missing elements so as to get to the desired list values.
Here is the code used by me, this does the task for me but it is not the most pythonic way of writing scripts. So how can I improve on this script?
l1 = [0,1,0,1,1,1,0,1,0,0,0,1,0,1]
indices = []
for i in range(1,len(l1)):
if l1[i]!=l1[i-1]:
continue
else:
if l1[i]==0:
val=1
else:
val=0
l1.insert(i, val)
EDIT
As asked in the comments, Let me explain why is this important rather than generating 1's and 0's. I have TTL pulse coming i.e. a series of HIGH(1) and LOW(0) coming in and simultaneously time for each of these TTL pulse is logged on 2 machines with different clocks.
Now while machine I is extremely stable and logging each sequence of HIGH(1) and low(1) accurately, the other machine ends up missing a couple of them and as a result I don't have time information for those.
All I wanted was to merge the missing TTL pulse on one machine wrt to the other machine. This will now allow me to align time on both of them or log None for not received pulse.
Reason for doing this rather than correcting the logging thing (as asked in comments) is that this is an old collected data. We have now fixed the logging issue.

You can try something like this:
from itertools import chain
l1 = [0,1,0,1,1,1,0,1,0,0,0,1,0,1]
c = max(l1.count(0), l1.count(1))
print list(chain(*zip([0]*c,[1]*c)))
Output:
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]

why would you have a list of 0,1,0,1,0,1? there is no good reason i can think of. oh well thats beyond the scope of this question i guess...
list(itertools.islice(itertools.cycle([0,1]),expected_length))

Just multiply a new list.
>>> l1 = [0,1,0,1,1,1,0,1,0,0,0,1,0,1]
>>> l1
[0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1]
>>> [0,1] * (len(l1)//2)
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
If the list has an odd number of elements, add the necessary 0:
>>> l2 = [0,1,0,1,1,1,0,1,0,0,0,1,0,1,0]
>>> l2_ = [0,1] * (len(l1)//2)
>>> if len(l2)%2: l2_.append(0)
...
>>> l2
[0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0]
>>> l2_
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

openpyxl: find highest row for each column - python

If you iterate over the rows of a worksheet you can always find the length of a row. That should be sufficient for your purposes. If not, please supply some of your code so it's clearer as to what exactly you want to do.

Related

How can I improve performance of Sudoku solver?

How to call pandas columns with numeric suffixes in a for loop then apply conditions based on other columns with numeric suffixes (python)?

iterating over non-interpolated Array of symbols python equivalent

Divide integer by a list to create a new list

Fill missing values in lists

Categories

Resources