Fill missing values in lists

Fill missing values in lists - python

I have a list which consists of 0's and 1's. The list should ideally look like this 0,1,0,1,0,1,0,1,0,1,0,1,0,1.....
But due to some error in logging, my list looks like this: 0,1,0,1,1,1,0,1,0,0,0,1,0,1.... As one can clearly there are some missed 0's and 1's in middle. How can I fix this list to add those 0's and 1's in between the missing elements so as to get to the desired list values.
Here is the code used by me, this does the task for me but it is not the most pythonic way of writing scripts. So how can I improve on this script?
l1 = [0,1,0,1,1,1,0,1,0,0,0,1,0,1]
indices = []
for i in range(1,len(l1)):
if l1[i]!=l1[i-1]:
continue
else:
if l1[i]==0:
val=1
else:
val=0
l1.insert(i, val)
EDIT
As asked in the comments, Let me explain why is this important rather than generating 1's and 0's. I have TTL pulse coming i.e. a series of HIGH(1) and LOW(0) coming in and simultaneously time for each of these TTL pulse is logged on 2 machines with different clocks.
Now while machine I is extremely stable and logging each sequence of HIGH(1) and low(1) accurately, the other machine ends up missing a couple of them and as a result I don't have time information for those.
All I wanted was to merge the missing TTL pulse on one machine wrt to the other machine. This will now allow me to align time on both of them or log None for not received pulse.
Reason for doing this rather than correcting the logging thing (as asked in comments) is that this is an old collected data. We have now fixed the logging issue.

You can try something like this:
from itertools import chain
l1 = [0,1,0,1,1,1,0,1,0,0,0,1,0,1]
c = max(l1.count(0), l1.count(1))
print list(chain(*zip([0]*c,[1]*c)))
Output:
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]

why would you have a list of 0,1,0,1,0,1? there is no good reason i can think of. oh well thats beyond the scope of this question i guess...
list(itertools.islice(itertools.cycle([0,1]),expected_length))

Just multiply a new list.
>>> l1 = [0,1,0,1,1,1,0,1,0,0,0,1,0,1]
>>> l1
[0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1]
>>> [0,1] * (len(l1)//2)
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
If the list has an odd number of elements, add the necessary 0:
>>> l2 = [0,1,0,1,1,1,0,1,0,0,0,1,0,1,0]
>>> l2_ = [0,1] * (len(l1)//2)
>>> if len(l2)%2: l2_.append(0)
...
>>> l2
[0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0]
>>> l2_
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0]

Related

How can I improve performance of Sudoku solver?

I can't improve the performance of the following Sudoku Solver code. I know there are 3 loops here and they probably cause slow performance but I can't find a better/more efficient way. "board" is mutated with every iteration of recursion - if there are no zeros left, I just need to exit the recursion.
I tried to isolate "board" from mutation but it hasn't changed the performance. I also tried to use list comprehension for the top 2 "for" loops (i.e. only loop through rows and columns with zeros), tried to find coordinates of all zeros, and then use a single loop to go through them - hasn't helped.
I think I'm doing something fundamentally wrong here with recursion - any advice or recommendation on how to make the solution faster?
def box(board,row,column):
start_col = column - (column % 3)
finish_col = start_col + 3
start_row = row - (row % 3)
finish_row = start_row + 3
return [y for x in board[start_row:finish_row] for y in x[start_col:finish_col]]
def possible_values(board,row,column):
values = {1,2,3,4,5,6,7,8,9}
col_values = [v[column] for v in board]
row_values = board[row]
box_values = box(board, row, column)
return (values - set(row_values + col_values + box_values))
def solve(board, i_row = 0, i_col = 0):
for rn in range(i_row,len(board)):
if rn != i_row: i_col = 0
for cn in range(i_col,len(board)):
if board[rn][cn] == 0:
options = possible_values(board, rn, cn)
for board[rn][cn] in options:
if solve(board, rn, cn):
return board
board[rn][cn] = 0
#if no options left for the cell, go to previous cell and try next option
return False
#if no zeros left on the board, problem is solved
return True
problem = [
[9, 0, 0, 0, 8, 0, 0, 0, 1],
[0, 0, 0, 4, 0, 6, 0, 0, 0],
[0, 0, 5, 0, 7, 0, 3, 0, 0],
[0, 6, 0, 0, 0, 0, 0, 4, 0],
[4, 0, 1, 0, 6, 0, 5, 0, 8],
[0, 9, 0, 0, 0, 0, 0, 2, 0],
[0, 0, 7, 0, 3, 0, 2, 0, 0],
[0, 0, 0, 7, 0, 5, 0, 0, 0],
[1, 0, 0, 0, 4, 0, 0, 0, 7]
]
solve(problem)

Three things you can do to speed this up:
Maintain additional state using arrays of integers to keep track of row, col, and box candidates (or equivalently values already used) so that finding possible values is just possible_values = row_candidates[row] & col_candidates[col] & box_candidates[box]. This is a constant factors thing that will change very little in your approach.
As kosciej16 suggested use the min-remaining-values heuristic for selecting which cell to fill next. This will turn your algorithm into crypto-DPLL, giving you early conflict detection (cells with 0 candiates), constraint propagation (cells with 1 candidate), and a lower effective branching factor for the rest.
Add logic to detect hidden singles (like the Norvig solver does). This will make your solver a little slower for the simplest puzzles, but it will make a huge difference for puzzles where hidden singles are important (like 17 clue puzzles).

A result that worked at the end thanks to 53x15 and kosciej16. Not ideal or most optimal but passes the required performance test:
def solve(board, i_row = 0, i_col = 0):
cells_to_solve = [((rn, cn), possible_values(board,rn,cn)) for rn in range(len(board)) for cn in range(len(board)) if board[rn][cn] == 0]
if not cells_to_solve: return True
min_n_of_values = min([len(x[1]) for x in cells_to_solve])
if min_n_of_values == 0: return False
best_cells_to_try = [((rn,cn),options) for ((rn,cn),options) in cells_to_solve if len(options) == min_n_of_values]
for ((rn,cn),options) in best_cells_to_try:
for board[rn][cn] in options:
if solve(board, rn, cn):
return board
board[rn][cn] = 0
return False

Divide integer by a list to create a new list

I've created a list of number in a specified range. I now want to divide an value by each element in the list, and then add that new value to a new list.
Heres what I've got:
Y = []
value = 55 #can be any value of my choosing
newx = list(range(50,500,10))
newy = value/(newx)**2
Y.append(newy)
I keep getting TypeError with unsupported operand types for ** or pow(): list and int and I don't know why. NOTE: The ** is a syntax for power i.e 1/(x^2)

One "clean" option to do it is to use numpy array:
import numpy as np
value = 55 #can be any value of my choosing
Y = np.arange(50,500,10)
Y = value/(Y)**2
You got an error since in python you cannot take a square of a list (and you also cannot devide a number by a list). numpy array allows you to take a square and to do this division and many other mathematical operations.

Your description says what you want to do: divide a value by each element in a list. But that's not what you're actually doing, which is trying to divide the value by the list itself. You should do what you say you want to:
Y = [value/v for v in newx]
(I don't understand what the ** is for, you don't mention that anywhere.)

You can just use a list comprehension :
newy = [value/x**2 for x in newx]
The error you get is because the square of a list isn't defined.
The square of a numpy.array is defined though, and would be a new array with the square of each element from the original array.
Depending on the value and range you're working with, you might want to convert the int to float first. You could get 0s otherwise :
>>> value = 55
>>> newx = range(50, 500, 10)
>>> [value/x**2 for x in newx]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
But :
>>> [value/float(x)**2 for x in newx]
[0.022, 0.015277777777777777, 0.011224489795918367, 0.00859375, 0.006790123456790123, 0.0055, 0.004545454545454545, 0.0038194444444444443, 0.003254437869822485, 0.0028061224489795917, 0.0024444444444444444, 0.0021484375, 0.0019031141868512112, 0.0016975308641975309, 0.0015235457063711912, 0.001375, 0.0012471655328798186, 0.0011363636363636363, 0.0010396975425330812, 0.0009548611111111111, 0.00088, 0.0008136094674556213, 0.0007544581618655693, 0.0007015306122448979, 0.0006539833531510107, 0.0006111111111111111, 0.0005723204994797086, 0.000537109375, 0.000505050505050505, 0.0004757785467128028, 0.0004489795918367347, 0.0004243827160493827, 0.00040175310445580715, 0.0003808864265927978, 0.0003616042077580539, 0.00034375, 0.0003271861986912552, 0.00031179138321995464, 0.00029745808545159546, 0.0002840909090909091, 0.00027160493827160494, 0.0002599243856332703, 0.00024898143956541424, 0.00023871527777777777, 0.00022907122032486465]

openpyxl: find highest row for each column

I'm new to python programming. I've written a script to get data from an api (using python 2.7.8), and now I'd like to add it to an excel spreadsheet where I keep all my data.
In my spreadsheet, each row is one day, but some of the data doesn't become available until up to 30 days later, so some of my columns are not full all the way to the current date. Basically, not all my column lengths are the same.
I'd like to read each column, find the highest row for that column, and then add my data points to the end of that column. If all columns were the same length, this would be simple, but I don't understand how to find the length of each column separately.
I've read through the docs for openpyxl, but I'm new to python and I don't really understand everything. I think the solution will involve something like 'for each column, get the highest row', and then I would append each data point to that column. but I don't understand how to do the 'for each column' part. Finding the length of each column would also work.
thanks in advance
Edit: I came up with a work around: I know the relative length of the columns so I subtracted that from the number for the last row:
last_row = ws.get_highest_row() + 1
col_num = 1
dataRow_length = len(dataRow)
row_offset = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 7, 14, 28, 1, 2, 3, 7, 14, 28]
for i in range(0, dataRow_length - 1):
ws.cell(row=(last_row - row_offset[col_num - 1]), column=col_num).value = dataRow[i]
col_num = col_num + 1

If you iterate over the rows of a worksheet you can always find the length of a row. That should be sufficient for your purposes. If not, please supply some of your code so it's clearer as to what exactly you want to do.

Class based directional indicator

I'm creating a class based directional indicator that given a number of days (n_days) and a list of numbers, it gives the (number of numbers out of the most recent n_days on which the number was higher than the previous number minus the n_days out of the previous n_days on which the number went down). So if the number in the list increases I want it to return +1, if it decreases I want it to return -1, otherwise it should be 0 so the first number should always be 0 since you can't compare it to anything. Then based on n_days I basically want to take the sum of the of the recent n_days, so for example in a list of [1,2,2,1,2,1] the change should be [0,+1,0,-1,1,-1] and if I want the sum of the change in the 2 recent numbers for each day so it should be [0,+1,-1,0,+1,0] because on the first day there is only 0, on the second day you take the sum of the most recent two days 0+(+1)=1, the third day (+1)+0=+1, the fourth day 0+(-1)=-1 and so forth. Here is my code that's not working:
class Directionalindicator():
def __init__(self, n_days, list_of_prices):
self.n_days = n_days
self.list_of_prices = list_of_prices
def calculate(self):
change = []
for i in range(len(self.list_of_prices)):
if self.list_of_prices[i-1] < self.list_of_prices[i]:
change.append(1)
elif self.list_of_prices[i-1] > self.list_of_prices[i]:
change.append(-1)
else:
change.append(0)
directional = []
for i in range(len(change)):
directional.append(sum(change[i+1-self.n_days:i+1]))
return directional
testing it with:
y = Directionalindicator(2,[1,2,2,1,2,1])
y.calculate()
should return:
[0,+1,+1,-1,0,0]
and it does.
But testing it with:
y = Directionalindicator(3, [1,2,3,4,5,6,7,8,9,10])
y.calculate()
should return
[0, 0, 2, 3, 3, 3, 3, 3, 3, 3]
but it returns
[0, 0, 1, 3, 3, 3, 3, 3, 3, 3]
I printed change to see what it was doing and the first value is a -1 instead of a 0. Also, the code in one of the answers works using zip, but I don't understand why mine doesn't work for that list from 1-10.

Your comparison
i > i-1
will always be True. You are comparing each price to itself minus one, which will always be smaller. Instead, you should be comparing pairs of prices. zip is useful for this:
change = [0] # first price always zero change
for a, b in zip(self.list_of_prices, self.list_of_prices[1:]):
if a < b: # price went up
change.append(1)
elif a > b: # price went down
change.append(-1)
else: # price stayed the same
change.append(0)
When you plug this into your code and use your example
Directionalindicator(2, [1, 2, 2, 1, 2, 1])
you get:
change == [0, 1, 0, -1, 1, -1]
directional == [0, 1, 1, -1, 0, 0]
This seems to be correct according to your initial statement of the rules, but for some reason doesn't match your "expected output" [0, 1, -1, 0, 1, 0] from the end of your question.
The reason your edit doesn't work is that you are using an index i on the list. When i == 0, i-1 == -1. When used as an index list_of_prices[-1], this gives you the last element in the list. Therefore change contains [-1, 1, 1, 1, 1, 1, 1, 1, 1, 1], as it compares 1 with 10, not [0, 1, 1, 1, 1, 1, 1, 1, 1, 1] as you expected.

Python nonogram uniqueness

I'm trying to write a Python script to determine if a given nonogram is unique. My current script just takes way too long to run so I was wondering if anyone had any ideas.
I understand that the general nonogram problem is NP-hard. However, I know two pieces of information about my given nonograms:
When representing the black/white boxes as 0s and 1s, respectively, I know how many of each I have.
I'm only considering 6x6 nonograms.
I initially used a brute force approach (so 2^36 cases). Knowing (1), however, I was able to narrow it down to n-choose-k (36-choose-number of zeroes) cases. However, when k is near 18, this is still ~2^33 cases. Takes days to run.
Any ideas how I might speed this up? Is it even possible?
Again, I don't care what the solution is -- I already have it. What I'm trying to determine is if that solution is unique.
EDIT:
This isn't exactly the full code but has the general idea:
def unique(nonogram):
found = 0
# create all combinations with the same number of 1s and 0s as incoming nonogram
for entry in itertools.combinations(range(len(nonogram)), nonogram.count(1)):
blank = [0]*len(nonogram) # initialize blank nonogram
for element in entry:
blank[element] = 1 # distribute 1s across nonogram
rows = find_rows(blank) # create row headers (like '2 1')
cols = find_cols(blank)
if rows == nonogram_rows and cols == nonogram_cols:
found += 1 # row and col headers same as original nonogram
if found > 1:
break # obviously not unique
if found == 1:
print('Unique nonogram')

I can't think of a clever way to prove uniqueness other than to solve the problem, but 6x6 is small enough that we can basically do a brute-force solution. To speed things up, instead of looping over every possible nonogram we can loop over all satisfying rows. Something like this (note: untested) should work:
from itertools import product, groupby
from collections import defaultdict
def vec_to_spec(v):
return tuple(len(list(g)) for k,g in groupby(v) if k)
def build_specs(n=6):
specs = defaultdict(list)
for v in product([0,1], repeat=n):
specs[vec_to_spec(v)].append(v)
return specs
def check(rowvecs, row_counts, col_counts):
colvecs = zip(*rowvecs)
row_pass = all(vec_to_spec(r) == tuple(rc) for r,rc in zip(rowvecs, row_counts))
col_pass = all(vec_to_spec(r) == tuple(rc) for r,rc in zip(colvecs, col_counts))
return row_pass and col_pass
def nonosolve(row_counts, col_counts):
specs = build_specs(len(row_counts))
possible_rows = [specs[tuple(r)] for r in row_counts]
sols = []
for poss in product(*possible_rows):
if check(poss, row_counts, col_counts):
sols.append(poss)
return sols
from which we learn that
>>> rows = [[2,2],[4], [1,1,1,], [2], [1,1,1,], [3,1]]
>>> cols = [[1,1,2],[1,1],[1,1],[4,],[2,1,],[3,2]]
>>> nonosolve(rows, cols)
[((1, 1, 0, 0, 1, 1), (0, 0, 1, 1, 1, 1), (1, 0, 0, 1, 0, 1),
(0, 0, 0, 1, 1, 0), (1, 0, 0, 1, 0, 1), (1, 1, 1, 0, 0, 1))]
>>> len(_)
1
is unique, but
>>> rows = [[1,1,1],[1,1,1], [1,1,1,], [1,1,1], [1,1,1], [1,1,1]]
>>> cols = rows
>>> nonosolve(rows, cols)
[((0, 1, 0, 1, 0, 1), (1, 0, 1, 0, 1, 0), (0, 1, 0, 1, 0, 1), (1, 0, 1, 0, 1, 0), (0, 1, 0, 1, 0, 1), (1, 0, 1, 0, 1, 0)),
((1, 0, 1, 0, 1, 0), (0, 1, 0, 1, 0, 1), (1, 0, 1, 0, 1, 0), (0, 1, 0, 1, 0, 1), (1, 0, 1, 0, 1, 0), (0, 1, 0, 1, 0, 1))]
>>> len(_)
2
isn't.
[Note that this isn't a very good solution for the problem in general as it throws away most of the information, but it was straightforward.]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fill missing values in lists - python

You can try something like this: from itertools import chain l1 = [0,1,0,1,1,1,0,1,0,0,0,1,0,1] c = max(l1.count(0), l1.count(1)) print list(chain(zip([0]c,[1]*c))) Output: [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]

why would you have a list of 0,1,0,1,0,1? there is no good reason i can think of. oh well thats beyond the scope of this question i guess... list(itertools.islice(itertools.cycle([0,1]),expected_length))

Related

How can I improve performance of Sudoku solver?

Divide integer by a list to create a new list

openpyxl: find highest row for each column

Class based directional indicator

Python nonogram uniqueness

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fill missing values in lists - python

You can try something like this: from itertools import chain l1 = [0,1,0,1,1,1,0,1,0,0,0,1,0,1] c = max(l1.count(0), l1.count(1)) print list(chain(*zip([0]*c,[1]*c))) Output: [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]

why would you have a list of 0,1,0,1,0,1? there is no good reason i can think of. oh well thats beyond the scope of this question i guess... list(itertools.islice(itertools.cycle([0,1]),expected_length))

Related

How can I improve performance of Sudoku solver?

Divide integer by a list to create a new list

openpyxl: find highest row for each column

Class based directional indicator

Python nonogram uniqueness

Categories

Resources

You can try something like this: from itertools import chain l1 = [0,1,0,1,1,1,0,1,0,0,0,1,0,1] c = max(l1.count(0), l1.count(1)) print list(chain(zip([0]c,[1]*c))) Output: [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]