Number manipulation in python - python

I new to scripting language. I have an excel (xls,2003) file which has following number
2 48
3 49
6 57
11 89
19 120
29 110
32 105
I am trying to do following
Read the excel file : Which I did
Find the difference between two consecutive number in column-2 in positive and negative sense.
Find the corresponding number in column-1 when the difference is maximum in positive or negative sense in column-2 defined by step-2.
I have following script done for reading excel file but I am not sure how to proceed
import xlrd
myexcel = xlrd.open_workbook('sample.xls')
#print "WorkSheets:", myexcel.sheet_by_names()
sheet = myexcel.sheet_by_index(0)
c = sheet.col_values(1)
print c
#data = [] #make a data store
I am expecting to following print
Max positive difference :11
Max negative difference :29

Here you go:
col1 = [2, 3, 6, 11, 19, 29, 32]
col2 = [48, 49, 57, 89, 120, 110, 105]
pd, nd, pi, ni = 0, 0, -1, -1
for i in range(len(col2)-1):
d = col2[i+1] - col2[i]
if d > 0 and d > pd:
pd, pi = d, i
if d < 0 and abs(d) > nd:
nd, ni = abs(d), i
print "Max positive difference :" + str(col1[pi+1])
print "Max negative difference :" + str(col1[ni+1])
Output:
>>>
Max positive difference :11
Max negative difference :29
Update : Short version
col1 = [2, 3, 6, 11, 19, 29, 32]
col2 = [48, 49, 57, 89, 120, 110, 105]
m = [(x[1] - x[0]) for x in zip(col2[:1] + col2, col2 + col2[-1:])]
print "Max positive difference :" + str(col1[m.index(max(m))])
print "Max negative difference :" + str(col1[m.index(min(m))])

You can try this without excessive use of local variables, loops and conditions:
#! /usr/bin/python
col1 = [2, 3, 6, 11, 19, 29, 32]
col2 = [48, 49, 57, 89, 120, 110, 105]
difs = [ (a, c - b) for a, b, c in zip (col1 [1:], col2, col2 [1:] ) ]
print (max (difs, key = lambda x: x [1] ) [0] )
print (max (difs, key = lambda x: -x [1] ) [0] )

I think this is something like what your looking for
(max_pos_difference, max_pos_row_col_1_value, neg_row_col_1_value, max_neg_row_col_1_value) = get_max_row()
print "Pos val: {} Pos row: {} Neg val: {} Neg row: {}".format(max_pos_difference, max_pos_row_col_1_value, neg_row_col_1_value, max_neg_row_col_1_value)
def get_max_row():
max_pos_difference = 0
max_neg_difference = 0
max_pos_row = 0
max_neg_row = 0
for rownum in range(sheet.nrows):
if rownum <= sheet.nrows - 1:
this_row_value = sheet.cell(rownum, 1).value
next_row_value = sheet.cell(rownum+1, 1).value
difference = next_row_value - this_row_value
if difference > max_pos_difference and difference >= 0:
max_pos_difference = difference
max_pos_row = rownum
if difference < max_neg_difference and difference < 0:
max_neg_difference = difference
max_neg_row = rownum
return (max_pos_difference, sheet.cell(max_pos_row, 0).value, max_neg_difference, sheet.cell(max_neg_row, 0).value

Related

find index of n consecutive values greater than zero with the largest sum from a numpy array (or pandas Series)

So here is my problem: I have an array like this:
arr = array([0, 0, 1, 8, 10, 20, 26, 32, 37, 52, 0, 0, 46, 42, 30, 19, 8, 2, 0, 0, 0])
In this array I want to find n consecutive values, greater than zero with the biggest sum. In this example with n = 5 this would be array([20, 26, 32, 37, 52]) and the index would be 5.
What I tried is of course a loop:
n = 5
max_sum = 0
max_loc = 0
for i in range(arr.size - n):
if all(arr[i:i + n] > 0) and arr[i:i + n].sum() > max_sum:
max_sum = arr[i:i + n].sum()
max_loc = i
print(max_loc)
This is fine for not too many short arrays but of course I need to use this on many not so short arrays.
I was experimenting with numpy so I would only have to iterate non-zero value groups:
diffs = np.concatenate((np.array([False]), np.diff(arr > 0)))
groups = np.split(arr, np.where(diffs)[0])
for group in groups:
if group.sum() > 0 and group.size >= n:
...
but I believe this is nice but not the right direction. I am looking for a simpler and faster numpy / pandas solution that really uses the powers of these packages.
Using cross-correlation, numpy.correlate, is a possible, concise and fast solution:
n=5
arr[arr<0] = np.iinfo(arr.dtype).min # The greatest negative integer possible
#Thanks for the np.iinfo suggestion, #Corralien
idx = np.argmax(np.correlate(arr, np.ones(n), 'valid'))
idx, arr[idx:(idx+5)]
Another possible solution:
n, l = 5, arr.size
arr[arr<0] = np.iinfo(arr.dtype).min # The greatest negative integer possible
#Thanks for the np.iinfo suggestion, #Corralien
idx = np.argmax([np.sum(np.roll(arr,-x)[:n]) for x in range(l-n+1)])
idx, arr[idx:(idx+n)]
Output:
(5, array([20, 26, 32, 37, 52]))
You can use sliding_window_view:
from numpy.lib.stride_tricks import sliding_window_view
N = 5
win = sliding_window_view(arr, N)
idx = ((win.sum(axis=1)) * ((win>0).all(axis=1))).argmax()
print(idx, arr[idx:idx+N])
# Output
5 [20 26 32 37 52]
Answer greatly enhanced by chrslg to save memory and keep a win as a view.
Update
A nice bonus is this should work with Pandas Series just fine.
N = 5
idx = pd.Series(arr).where(lambda x: x > 0).rolling(N).sum().shift(-N+1).idxmax()
print(idx, arr[idx:idx+N])
# Output
5 [20 26 32 37 52]

How to print the sum of columns and rows in python?

I would like to print sum of each row and sum of each column of a two dimensional array, like this:
sum row1 = 123 (numbers are not real sums, just for example)
sum row2 = 123
sum row3 = 123
And the same with columns. I know how to do it in java, but dont know how to do it in python.
This is my code(missing code for sums of rows and columns, because I dont know how to do it):
from random import randint
dim1 = input("Insert first dimension: ")
dim1 = int(dim1)
dim2 = input("Insert second dimension: ")
dim2 = int(dim2)
table1d = []
for i in range(dim1):
table2d = []
for j in range(dim2):
table2d.append(randint(1, 170))
table1d.append(table2d)
print(table1d)
totalSum = sum(map(sum, table1d))
print(totalSum)
sumRows = 0
for i in range(0, len(table1d), 1):
sumRows += table1d[i]
For rows you need only
sums_in_rows = list(map(sum, table1d))
print(sums_in_rows)
For columns it needs more
sums_in_columns = [0]*len(table1d[0]) # create list for all results
for row in table1d:
for c, value in enumerate(row):
sums_in_columns[c] += value
print(sums_in_columns)
You can also convert it to numpy array and then you have
import numpy as np
arr = np.array(table1d)
print('rows:', arr.sum(axis=1))
print('cols:', arr.sum(axis=0))
print('total:', arr.sum())
from random import randint
dim1 = input("Insert first dimension: ")
dim1 = int(dim1)
dim2 = input("Insert second dimension: ")
dim2 = int(dim2)
table1d = []
#x = 0
for i in range(dim1):
table2d = []
for j in range(dim2):
table2d.append(randint(1, 170))
#table2d.append(x)
#x += 1
table1d.append(table2d)
print(table1d)
sums_in_rows = list(map(sum, table1d))
print(sums_in_rows)
sums_in_columns = [0]*len(table1d[0])
for row in table1d:
for c, value in enumerate(row):
sums_in_columns[c] += value
print(sums_in_columns)
import numpy as np
arr = np.array(table1d)
print(arr.sum(axis=1))
print(arr.sum(axis=0))
print(arr.sum())
import numpy as np
Columns:
np.array(table1d).sum(axis=0)
Rows:
np.array(table1d).sum(axis=1)
You can use list comprehensions and the sum function to obtain the desired result:
import random
rowCount = 3
colCount = 5
matrix = [ [random.randint(10,99) for _ in range(colCount)] for _ in range(rowCount) ]
for line in matrix:
print(line)
for row in range(rowCount):
print(f"sum row{row} = ",sum(matrix[row]))
for col in range(colCount):
print(f"sum column{col} = ",sum(row[col] for row in matrix))
[90, 62, 86, 19, 13]
[33, 93, 38, 17, 29]
[11, 96, 91, 66, 81]
sum row0 = 270
sum row1 = 210
sum row2 = 345
sum column0 = 134
sum column1 = 251
sum column2 = 215
sum column3 = 102
sum column4 = 123
Here is a simple and straightforward two-loop solution, if you want to do both the sums together.
container = [[1, 2, 3, 4],
[3, 2, 1, 5],
[4, 5, 6, 6]]
rowSum, colSum, i = [0]*len(container), [0]*len(container[0]), 0
while i < len(container):
j = 0
while j < len(container[0]):
rowSum[i] += container[i][j]
colSum[j] += container[i][j]
j += 1
i += 1
print(rowSum, colSum)
Hope it helps. ✌

Find all unique combinations of a fixed size to reach a given average range

I have a range of integers e.g.
big_list = [1, 2, ..., 100]
and I need to find all fixed length subsets of the numbers in this range that have an average within k of 50 (like 45-55) for k=5. e.g. we have a fixed size of 6 with an average of around 50
sample = [71, 20, 23, 99, 25, 60]
The problem is that the lists have to be unique, with no repeated numbers.
The order doesn't matter, so [71, 20, 23, 99, 25, 60] and [20, 71, 23, 99, 25, 60] is just one combination.
I was thinking of just using itertools to generate all combinations and filtering out based on my criteria. But the run time for that would be really bad as the big list of numbers could range from a size 10 to a size 400.
How can I generate a set of lists with the above criteria
Order is trivial to address.
Just make an arrangement with contraints that i-th number is greater than (i-1)-th. In below algorithm you recurse by incrementing left by 1
To get the average in between 45 and 55 consider the recursive formula
import math
# left: previous number is left-1
# s: current sum
# d: depth
# c: combination
def fill(left, s, d, c):
if d == 0:
return print(c, sum(c)/n)
# constraint c_sum >= 45*n
# we look for minimal i such that
# sum + i + 100+99+...+(100-(depth-1)+1) >= 45*n
# sum + i + 100*(d-1) - (d-2)(d-1)/2 >= 45*n
# i >= 45*n - 100*(d-1) - (d-2)(d-1)/2 - sum
#
# constraint c_sum <= 55*n
# we look for maximal right such that
# sum + i + (i+1)+...+(i+(d-1)) <= 55*n
# sum + (d-1)*i + d(d-1)/2 <= 55*n
# i <= ( 55*n - d(d-1)/2 - sum )/(d-1)
minleft = max(left, math.ceil(minBound*n - 100*(d-1) - (d-2)*(d-1)/2 - s))
if d == 1:
maxright = min(100, maxBound*n-s)
else:
maxright = min(100, math.floor(( maxBound*n - d*(d-1)/2 - s )/(d-1)) )
for i in range(minleft, maxright+1):
newsum = s + i
c[d-1] = i
fill(i+1, newsum, d-1, c)
n = 6
minBound = 45
maxBound = 55
fill(0, 0, n, [0]*n)
after further comments, op is not interested at all into combinations as above but in combinations such that no number can appear twice across all combinations
algo can be reduced as the very basic one:
n = 300
c = list(range(1, n))
while len(c) >= 6:
print([c.pop(), c.pop(), c.pop(), c.pop(0), c.pop(0), c.pop(0)])
You can use recursion with a generator:
def combo(d, k, c = []):
if len(c) == 6:
yield c
else:
for i in d:
_c = (sum(c)+i)/float(len(c)+1)
if i not in c and (len(c) + 1 < 6 or 50-k <= _c <= 50+k):
yield from combo(d, k, c+[i])
Of course, as #japreiss pointed out, this problem will produce a very bad worst-case time complexity. A possible workaround, however, is to treat combo as an iterator pool, and simply access the produced combinations on demand elsewhere in your code. For instance, to access the first 100 results:
result = combo(range(1, 100), 5)
for _ in range(100):
print(next(result))
Output:
[1, 2, 3, 67, 98, 99]
[1, 2, 3, 67, 99, 98]
[1, 2, 3, 68, 97, 99]
[1, 2, 3, 68, 98, 99]
[1, 2, 3, 68, 99, 97]
[1, 2, 3, 68, 99, 98]
[1, 2, 3, 69, 96, 99]
[1, 2, 3, 69, 97, 98]
[1, 2, 3, 69, 97, 99]
[1, 2, 3, 69, 98, 97]
...

Count the number of times values appear within a range of values

How do I output a list which counts and displays the number of times different values fit into a range?
Based on the below example, the output would be x = [0, 3, 2, 1, 0] as there are 3 Pro scores (11, 24, 44), 2 Champion scores (101, 888), and 1 King score (1234).
- P1 = 11
- P2 = 24
- P3 = 44
- P4 = 101
- P5 = 1234
- P6 = 888
totalsales = [11, 24, 44, 101, 1234, 888]
Here is ranking corresponding to the sales :
Sales___________________Ranking
0-10____________________Noob
11-100__________________Pro
101-1000________________Champion
1001-10000______________King
100001 - 200000__________Lord
This is one way, assuming your values are integers and ranges do not overlap.
from collections import Counter
# Ranges go to end + 1
score_ranges = [
range(0, 11), # Noob
range(11, 101), # Pro
range(101, 1001), # Champion
range(1001, 10001), # King
range(10001, 200001) # Lord
]
total_sales = [11, 24, 44, 101, 1234, 888]
# This counter counts how many values fall into each score range (by index).
# It works by taking the index of the first range containing each value (or -1 if none found).
c = Counter(next((i for i, r in enumerate(score_ranges) if s in r), -1) for s in total_sales)
# This converts the above counter into a list, taking the count for each index.
result = [c[i] for i in range(len(score_ranges))]
print(result)
# [0, 3, 2, 1, 0]
As a general rule homework should not be posted on stackoverflow. As such, just a pointer on how to solve this, implementation is up to you.
Iterate over the totalsales list and check if each number is in range(start,stop). Then for each matching check increment one per category in your result list (however using a dict to store the result might be more apt).
Here a possible solution with no use of modules such as numpy or collections:
totalsales = [11, 24, 44, 101, 1234, 888]
bins = [10, 100, 1000, 10000, 20000]
output = [0]*len(bins)
for s in totalsales:
slot = next(i for i, x in enumerate(bins) if s <= x)
output[slot] += 1
output
>>> [0, 3, 2, 1, 0]
If your sales-to-ranking mapping always follows a logarithmic curve, the desired output can be calculated in linear time using math.log10 with collections.Counter. Use an offset of 0.5 and the abs function to handle sales of 0 and 1:
from collections import Counter
from math import log10
counts = Counter(int(abs(log10(abs(s - .5)))) for s in totalsales)
[counts.get(i, 0) for i in range(5)]
This returns:
[0, 3, 2, 1, 0]
Here, I have used the power of dataframe to store the values, then using bin and cut to group the values into the right categories. The extracting the value count into list.
Let me know if it is okay.
import pandas as pd
import numpy
df = pd.DataFrame([11, 24, 44, 101, 1234, 888], columns=['P'])# Create dataframe
bins = [0, 10, 100, 1000, 10000, 200000]
labels = ['Noob','Pro', 'Champion', 'King', 'Lord']
df['range'] = pd.cut(df.P, bins, labels = labels)
df
outputs:
P range
0 11 Pro
1 24 Pro
2 44 Pro
3 101 Champion
4 1234 King
5 888 Champion
Finally, to get the value count. Use:
my = df['range'].value_counts().sort_index()#this counts to the number of occurences
output=map(int,my.tolist())#We want the output to be integers
output
The result below:
[0, 3, 2, 1, 0]
You can use collections.Counter and a dict:
from collections import Counter
totalsales = [11, 24, 44, 101, 1234, 888]
ranking = {
0: 'noob',
10: 'pro',
100: 'champion',
1000: 'king',
10000: 'lord'
}
c = Counter()
for sale in totalsales:
for k in sorted(ranking.keys(), reverse=True):
if sale > k:
c[ranking[k]] += 1
break
Or as a two-liner (credits to #jdehesa for the idea):
thresholds = sorted(ranking.keys(), reverse=True)
c = Counter(next((ranking[t] for t in thresholds if s > t)) for s in totalsales)

Grouping the list elements into user defined intervals

def interval():
data = [1, 2, 12, 13, 22, 23, 32, 33, 42, 43, 52, 53, 62, 63, 72, 73, 82, 83, 92, 93]
minimum = raw_input("Enter the min value")
maximum = raw_input("Enter the max value")
frequency = raw_input("Enter the Freq")
x = []
x.append(float(minimum))
thesum = float(minimum)
for i in range(0, int(maximum)):
if thesum < float(maximum):
thesum = thesum + float(frequency)
x.append(thesum)
print x
if __name__ == '__main__':
interval()
**Assume the user enters the min, max and freq to be 0,100,20 respectively.
So, the intervals are 0-20, 20-40,40-60,60-80,80-100 and my output should be
The values in 0-20 are [1,2,12,13]
The values in 20-40 are [22,23,32,33]
.. and so on!
**
If there are no values in a particular interval, the output should be list with no values
A very naive way to implement this would be as follows
def group_items(data, low_value, high_value):
return [value for value in data if value >= low_value and value <= high_value]
This function returns the list of numbers that exist in the ranges [min, max] inclusive, therefore max will be accounted for in both (min,max) and (min+freq, max+freq) because max of first interval is min+freq of next interval. Of course you can correct this based on your requirements.
min_max_pairs = []
for x in xrange(minimum, maximum, frequency):
pair = (x, x+frequency)
min_max_pairs.append(pair)
This creates the map from the minimum to maximum values specified by the user using the frequency interval. In your case the values in min, max and freq are
minimum = raw_input("Enter the min value")
maximum = raw_input("Enter the max value")
frequency = raw_input("Enter the Freq")
This returns the pair as follows:
>>> min_max_pairs
[(0, 20), (20, 40), (40, 60), (60, 80), (80, 100)]
Now loop through the pairs and pass them to the group_items() to get the required result
for pair in min_max_pairs:
min = pair[0]
max = pair[1]
interval = freq
print ("Range [",min,"-",max,"] : ", group_items(data, min, max))
that results in
Range [ 0 - 20 ]: [1, 2, 12, 13]
Range [ 20 - 40 ]: [22, 23, 32, 33]
Range [ 40 - 60 ]: [42, 43, 52, 53]
Range [ 60 - 80 ]: [62, 63, 72, 73]
Range [ 80 - 100 ]: [82, 83, 92, 93]

Categories

Resources