LP Programming with python using Pulp library - python

I have a linear programming problem where I need to minimise the cost of manufacturing a number of items in the span of n months. Xi is the variable for each amount of items manufactured corresponding to month i. Now, I want to include a constraint where if Xi > 0, then a number A is going to be added to the objective function.
Obviously this can't be done with a boolean expression inside a for loop for example since Xi is a class object from the pulp library. Does anybody know how to help me?
Docplex is not working
Thank you so much.
x = [LpVariable(name=f"x{i}", lowBound=0) for i in range(0, 12)]
# standards
manufacturing_time_per_unit = 1/3
cost_of_hour = 12
storage_cost_per_unit = 3
# these are monthly
cost_of_raw_materials_per_unit = [11, 10, 13, 9, 8, 7,
10, 12, 12, 10, 9]
demand = [150, 200, 100, 300, 200,
400, 300, 250, 150, 200, 300, 350]
avalaible_hours = [250, 250, 200, 150, 200, 200,
150, 200, 250, 150, 150, 200]
cost_sum = 0
stored = [100]
for i in range(1, 13):
cost_constraint = manufacturing_time_per_unit*x[i-1] <= avalaible_hours[i-1]
model += cost_constraint
demand_constraint = x[i-1] + stored >= demand[i-1]
model += demand_constraint
stored.append(x[i-1] + stored - demand[i-1])
cost_sum += manufacturing_time_per_unit*x[i-1]+stored[i-1]*storage_cost_per_unit
storage_constraint = x[i-1] != 0
#if x[i-1]>0:
# cost_sum += 1000
model += cost_sum
model.solve()

Add a binary variable y[i] with y[i]=0 => x[i]=0. (This implies x[i]>0 => y[i]=1.) I.e.
min sum(i, 1000*y[i])
x[i] <= U*y[i]
x[i] >= 0
y[i] ∈ {0,1}
Here U is an upper bound on x[i].

Related

Fastest way to get the index of a numpy array in a specific window

I have a tiny problem :
Indeed, I have two numpy arrays with different lengths
x = np.array([0, 22, 34, 45, ..., 78540, 81000, ..., 1245775452]) # length = 12455231
y = np.array([10, 28, 45, 74, ..., 44444, 82002, ..., 1452424332]) # length = 13789122
And I have two parameters :
window = 30
t = 100
The task is very simple, it is to find the index of y that are in the different windows (defined by the values of x) such as :
result = []
for i in range(0, len(x)):
window_min = x[i] + transfer - window
window_max = x[i] + transfer + window
idx_y_window = np.argwhere((window_min < y) & (y < window_max))[[0]]
if len(idx_y_window) > 0:
result.append([i, idx_y_window[0]]) # I can take the first index in this list
The goal is to have the "association" of the index of x with the index of y in a specific window.
The problem is, this algorithm is very very slow in python.
Is there a simple way to do it with pure numpy so it will be faster ?
Thank you

Construct list of intervals consisting of midpoints of successive input values

I have a Python list consisting of data:
[60, 120, 180, 240, 480]
I need to calculate the intervals between the elements in the list and get the value in between the elements. For the above list, I'm looking for this output:
[[60, 90], [90, 150], [150, 210], [210, 360], [360, 480]]
The first and last values of the list is directly transferred, but the values in-between are obtained by the following method: e.g. for 60 and 120: (120 - 60 = 60 / 2 = 30 + 60 = 90)
I cannot work out how to do this in a simple pythonic fashion, and I have buried myself in if/else statements to solve it.
You can do this fairly simply with pairwise. It's included in Python as of version 3.10, but if you're on an earlier version, you can get it from more_itertools or implement it yourself. (I also use mean, which is a handy convenience even though it's trivial to reimplement.)
from itertools import pairwise
from statistics import mean
original = [60, 120, 180, 240, 480]
midpoints = [mean(pair) for pair in pairwise(original)]
output = list(pairwise([original[0], *midpoints, original[-1]]))
print(output)
[(60, 90), (90, 150), (150, 210), (210, 360), (360, 480)]
Note that this outputs each pair as a tuple, rather than a list, as in your sample output. I think this is more idiomatic, and I would prefer it in my own code. However, if you'd prefer lists, it's a simple change:
from itertools import pairwise
from statistics import mean
original = [60, 120, 180, 240, 480]
midpoints = [mean(pair) for pair in pairwise(original)]
output = [
list(pair) for pair in pairwise([original[0], *midpoints, original[-1]])
]
print(output)
[[60, 90], [90, 150], [150, 210], [210, 360], [360, 480]]
We can replace inner points by midpoints and then turn adjacent pairs into intervals:
a[1:-1] = map(mean, pairwise(a))
a[:] = pairwise(a)
As non-modifying function (Try it online!):
from itertools import pairwise
from statistics import mean
def intervals(a):
a = a[:]
a[1:-1] = map(mean, pairwise(a))
return list(pairwise(a))
print(intervals([60, 120, 180, 240, 480]))
Output:
[(60, 90), (90, 150), (150, 210), (210, 360), (360, 480)]
The intervals are tuples instead of lists, but like CrazyChucky I think that tuples are more idiomatic for this (unless you actually have a need for them to be lists).
The shortest version that i can do
A = [60, 120, 180, 240, 480]
B = []
for i in range(len(A)):
if i == 0:
B.append([A[i], (A[i] + A[i+1])//2])
else:
if(i < len(A)-1):
B.append([B[i - 1][1], (A[i] + A[i+1])//2])
else:
B.append([B[i - 1][1], A[i]])
print(B)
x_list = [60, 120, 180, 240, 480]
y = []
for i in range(len(x_list) - 1):
y.append(int((x_list[i]+x_list[i+1])/2))
print(y)
z = [[x_list[0], y[0]]]
for i in range(0, len(y)-1):
z.append([y[i], y[i+1]])
print(z)
z.append([y[-1], x_list[-1]])
print(z)
Try the above.
The first line printed are the midpoints, and the last row shows the final list.
It's a matter of understanding indices. There are probably faster ways to do it, but learning, it's better to iterate.
$ python example.py
[90, 150, 210, 360]
[[60, 90], [90, 150], [150, 210], [210, 360], [360, 480]]
$
You can use the following:
def getIntervals(l):
l2 = []
l2.append(l[0])
for i in range(0, len(l) - 1):
l2.append((l[i] + l[i+1]) // 2)
l2.append(l[-1])
ret = []
for i in range(len(l2) - 1):
ret.append([l2[i], l2[i+1]])
return ret
And then you can call it like this:
x = [60, 120, 180, 240, 480]
y = getIntervals(x)
print(y) # prints [[60, 90], [90, 150], [150, 210], [210, 360], [360, 480]]
Basically, you iterate through the list, and then you find the endpoints of each interval, and then you iterate through again and form the pairs of endpoints.
Most pythonic way I can come up with:
from itertools import pairwise # python ≥3.10 only
l = [60, 120, 180, 240, 480]
mid_points = map(lambda pair: sum(pair)//2, pairwise(l)) # [90, 150, 210, 360]
all_values = l[0], *mid_points, l[-1] # pre/append 60 and 480 resp. to mid_points
# all_values: (60, 90, 150, 210, 360, 480)
list(pairwise(all_values))
# [(60, 90), (90, 150), (150, 210), (210, 360), (360, 480)]
If you don't have Python 3.10 you can emulate pairwise with:
from itertools import tee
def pairwise(iterable):
# pairwise('ABCDEFG') --> AB BC CD DE EF FG
a, b = tee(iterable)
next(b, None)
return zip(a, b)
Solution written in function format (without needing to import anything):
Function:
def get_interval_list(my_list):
interval_start = 0
interval_end = 0
interval_list = []
for i in range(len(my_list)):
if i == 0:
interval_start = my_list[0]
else:
interval_start = interval_end
if i == len(my_list)-1:
interval_end = my_list[-1]
else:
current_num = my_list[i]
next_num = my_list[i+1]
interval_end = (current_num + next_num) / 2
interval_list.append([int(interval_start), int(interval_end)])
return interval_list
Function with commented explanations:
def get_interval_list(my_list):
# variables (self explanatory... interval_list temporarily stores the interval start and
# end pair for each number being iterated through)
interval_start = 0
interval_end = 0
interval_list = []
# for loop iterates through the conditional statements for a number of times dependent on the
# length of my_list, finds interval_start and interval_end for each number in my_list,
# and then add each pair to interval_list
for i in range(len(my_list)):
# conditional statement checks if the current number is the first item in my_list,
# and if it is, assigns it to interval_start - if it isn't, the interval_end for the
# previous number will be the same as the interval_start for the
# current number (as per your requested output)
if i == 0:
interval_start = my_list[0]
else:
interval_start = interval_end
# conditional statement checks if the current number is the last item in my_list,
# and if it is, assigns it to interval_end - if it isn't, we calculate interval_end by
# adding the current number and the next number in my_list and dividing by 2
if i == len(my_list)-1:
interval_end = my_list[-1]
else:
current_num = my_list[i]
next_num = my_list[i+1]
interval_end = (current_num + next_num) / 2
# values of interval_start and interval_end are added as a pair to interval_list
# (use int() here if you do not want your interval pairs to be returned as floats)
interval_list.append([int(interval_start), int(interval_end)])
return interval_list
Sample run using function above:
list_of_numbers = [60, 120, 180, 240, 480]
print(get_interval_list(list_of_numbers))
Output:
[[60, 90], [90, 150], [150, 210], [210, 360], [360, 480]]
(Could be written more simply but I didn't want to sacrifice readability)
Just a side note: we don't need to use (120 - 60 = 60 / 2 = 30 + 60 = 90) to calculate the midpoints. There is a much simpler way; all we have to do is add the upper and lower limits and divide by 2 like so:
(60 + 120) / 2 = 90
(This mathematical method of finding midpoints works for any "range")
This is the best i can come up with, but i'm really unsure if this is the simplest approach:
def generate_split_intervals(input_list):
first_list_value = input_list[0]
last_list_value = input_list[-1]
return_list = []
last_append = 0
for idx, item in enumerate(input_list):
if item == first_list_value:
last_append = int(first_list_value + (abs(input_list[idx+1]-first_list_value)/2))
return_list.append([first_list_value, last_append ])
elif item == last_list_value:
return_list.append([last_append, last_list_value])
else:
this_append = int(item + (abs(input_list[idx+1]-item)/2))
return_list.append([last_append, this_append])
last_append = this_append
return return_list
arr = [60, 120, 180, 240, 480]
def foo(x, i):
return [(x[i] + x[max(0, i - 1)])//2,
(x[i] + x[min(len(x) - 1, i + 1)])//2]
[foo(arr, i) for i in range(len(arr))]
# [[60, 90], [90, 150], [150, 210], [210, 360], [360, 480]]
OR
lz = zip((zip(arr, [arr[0]] + arr[:-1])),
(zip(arr, arr[1:] + [arr[-1]])))
[[sum(a)//2, sum(b)//2] for a, b in lz]
# [[60, 90], [90, 150], [150, 210], [210, 360], [360, 480]]

How to split a list into 2 unsorted groupings based on the median

I am aiming to sort a list into two subsections that don't need to be sorted.
Imagine I have a list of length 10 that has values 0-9 in it.
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
I would want to sort it in a way that indices 0 through 4 contain values 10, 20, 30, 40, and 50 in any ordering.
For example:
# SPLIT HERE V
[40, 30, 20, 50, 10, 70, 60, 80, 90, 100]
I've looked into various divide and conquer sorting algorithms, but I'm uncertain which one would be the best to use in this case.
My current thought is to use quicksort, but I believe there is a better way to do what I am searching to do since everything does not need to be sorted exactly, but sorted in a "general" sense that all values are on their respective side of the median in any ordering.
to me this seems to do the trick , unless you exactly need the output to be unordered :
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
sorted_arr = sorted(arr)
median_index = len(arr)//2
sub_list1, sub_list2 = sorted_arr[:median_index],sorted_arr[median_index:]
this outputs :
[10, 20, 30, 40, 50] [60, 70, 80, 90, 100]
The statistics package has a method for finding the median of a list of numbers. From there, you can use a for loop to separate the values into two separate lists based on whether or not it is greater than the median:
from statistics import median
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
med = median(arr)
result1 = []
result2 = []
for item in arr:
if item <= med:
result1.append(item)
else:
result2.append(item)
print(result1)
print(result2)
This outputs:
[50, 30, 20, 10, 40]
[90, 100, 70, 60, 80]
If you would like to solve the problem from scratch you could implement Median of Medians algorithm to find median of unsorted array in linear time. Then it depends what is your goal.
If you would like to make the reordering in place you could use the result of Median of Medians algorithm to select a Pivot for Partition Algorithm (part of quick sort).
On the other hand using python you could then just iterate through the array and append the values respectively to left or right array.
Other current other answers have the list split into two lists, and based on your example I am under the impression there is two groupings, but the output is one list.
import numpy as np
# setup
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
# output array
unsorted_grouping = []
# get median
median = np.median(arr)
# loop over array, if greater than median, append. Append always assigns
# values at the end of array
# else insert it at position 0, the beginning / left side
for val in arr:
if val >= median:
unsorted_grouping.append(val)
else:
unsorted_grouping.insert(0, val)
# output
unsorted_grouping
[40, 10, 20, 30, 50, 90, 100, 70, 60, 80]
You can use the statistics module to calculate the median, and then use it to add each value to one group or the other:
import statistics
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
median = statistics.median(arr)
bins = [], [] # smaller and bigger values
for value in arr:
bins[value > median].append(value)
print(bins[0]) # -> [50, 30, 20, 10, 40]
print(bins[1]) # -> [90, 100, 70, 60, 80]
You can do this with numpy (which is significantly faster if arr is large):
import numpy as np
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
arr = np.array(arr)
median = np.median(arr)
result1 = arr[arr <= median]
result2 = arr[arr > median]
Output:
array([50, 30, 20, 10, 40])
array([ 90, 100, 70, 60, 80])
And if you want one list as the output, you can do:
[*result1, *result2]
Output:
[50, 30, 20, 10, 40, 90, 100, 70, 60, 80]
My first Python program, so please bear with me.
Basically does QuickSort, as you suggest, but only sub-sorts the partition that holds the median index.
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
def partition(a, left, right):
pivot = (left + right)//2
a[left],a[pivot] = a[pivot], a[left] # swap
pivot = left
left += 1
while right >= left :
while left <= right and a[left] <= a[pivot] :
left += 1
while left <= right and a[right] > a[pivot] :
right -= 1
if left <= right:
a[left] , a[right] = a[right], a[left]
left += 1
right -= 1
else:
break
a[pivot], a[right] = a[right] , a[pivot]
return right
def medianSplit(array):
left = 0;
right = len(array) - 1;
med = len(array) // 2;
while (left < right):
pivot = partition(array, left, right)
if pivot > med:
right = pivot - 1;
else:
left = pivot + 1;
def main():
medianSplit(arr)
print(arr)
main()

Any way to speedup itertool.product

I am using itertools.product to find the possible weights an asset can take given that the sum of all weights adds up to 100.
min_wt = 10
max_wt = 50
step = 10
nb_Assets = 5
weight_mat = []
for i in itertools.product(range(min_wt, (max_wt+1), step), repeat = nb_Assets):
if sum(i) == 100:
weight = [i]
if np.shape(weight_mat)[0] == 0:
weight_mat = weight
else:
weight_mat = np.concatenate((weight_mat, weight), axis = 0)
The above code works, but it is too slow as it goes through the combinations that are not acceptable, example [50,50,50,50,50] eventually testing 3125 combinations instead of 121 possible combinations. Is there any way we can add the 'sum' condition within the loop to speed things up?
Many improvements are possible.
For starters, the search space can be reduced using itertools.combinations_with_replacement() because summation is commutative.
Also, the last addend should be computed rather than tested. For example if t[:4] was (10, 20, 30, 35), you could compute t[4] as 1 - sum(t), giving a value of 5. This will give a 100-fold speed-up over trying one-hundred values of x in (10, 20, 30, 35, x).
You can write up a recursive algorithm for that which prunes all the impossible options early on:
def make_weight_combs(min_wt, max_wt, step, nb_assets, req_wt):
weights = range(min_wt, max_wt + 1, step)
current = []
yield from _make_weight_combs_rec(weights, nb_assets, req_wt, current)
def _make_weight_combs_rec(weights, nb_assets, req_wt, current):
if nb_assets <= 0:
yield tuple(current)
else:
# Discard weights that cannot possibly be used
while weights and weights[0] + weights[-1] * (nb_assets - 1) < req_wt:
weights = weights[1:]
while weights and weights[-1] + weights[0] * (nb_assets - 1) > req_wt:
weights = weights[:-1]
# Add all possible weights
for w in weights:
current.append(w)
yield from _make_weight_combs_rec(weights, nb_assets - 1, req_wt - w, current)
current.pop()
min_wt = 10
max_wt = 50
step = 10
nb_assets = 5
req_wt = 100
for comb in make_weight_combs(min_wt, max_wt, step, nb_assets, req_wt):
print(comb, sum(comb))
Output:
(10, 10, 10, 20, 50) 100
(10, 10, 10, 30, 40) 100
(10, 10, 10, 40, 30) 100
(10, 10, 10, 50, 20) 100
(10, 10, 20, 10, 50) 100
(10, 10, 20, 20, 40) 100
(10, 10, 20, 30, 30) 100
(10, 10, 20, 40, 20) 100
...
If order of the weights does not matter (so, for example, (10, 10, 10, 20, 50) and (50, 20, 10, 10, 10) are the same), then you can modify the for loop as follows:
for i, w in enumerate(weights):
current.append(w)
yield from _make_weight_combs_rec(weights[i:], nb_assets - 1, req_wt - w, current)
current.pop()
Which gives the output:
(10, 10, 10, 20, 50) 100
(10, 10, 10, 30, 40) 100
(10, 10, 20, 20, 40) 100
(10, 10, 20, 30, 30) 100
(10, 20, 20, 20, 30) 100
(20, 20, 20, 20, 20) 100
Comparing performance of the offered solutions:
import itertools
import timeit
import numpy as np
# original code from question
def f1():
min_wt = 10
max_wt = 50
step = 10
nb_assets = 5
weight_mat = []
for i in itertools.product(range(min_wt, (max_wt+1), step), repeat=nb_assets):
if sum(i) == 100:
weight = [i, ]
if np.shape(weight_mat)[0] == 0:
weight_mat = weight
else:
weight_mat = np.concatenate((weight_mat, weight), axis=0)
return weight_mat
# code from question using list instead of numpy array
def f1b():
min_wt = 10
max_wt = 50
step = 10
nb_assets = 5
weight_list = []
for i in itertools.product(range(min_wt, (max_wt+1), step), repeat=nb_assets):
if sum(i) == 100:
weight_list.append(i)
return weight_list
# calculating the last element of each tuple
def f2():
min_wt = 10
max_wt = 50
step = 10
nb_assets = 5
weight_list = []
for i in itertools.product(range(min_wt, (max_wt+1), step), repeat=nb_assets-1):
the_sum = sum(i)
if the_sum < 100:
last_elem = 100 - the_sum
if min_wt <= last_elem <= max_wt:
weight_list.append(i + (last_elem, ))
return weight_list
# recursive solution from user kaya3 (https://stackoverflow.com/a/58823843/9225671)
def constrained_partitions(n, k, min_w, max_w, w_step=1):
if k < 0:
raise ValueError('Number of parts must be at least 0')
elif k == 0:
if n == 0:
yield ()
else:
for w in range(min_w, max_w+1, w_step):
for p in constrained_partitions(n-w, k-1, min_w, max_w, w_step):
yield (w,) + p
def f3():
return list(constrained_partitions(100, 5, 10, 50, 10))
# recursive solution from user jdehesa (https://stackoverflow.com/a/58823990/9225671)
def make_weight_combs(min_wt, max_wt, step, nb_assets, req_wt):
weights = range(min_wt, max_wt + 1, step)
current = []
yield from _make_weight_combs_rec(weights, nb_assets, req_wt, current)
def _make_weight_combs_rec(weights, nb_assets, req_wt, current):
if nb_assets <= 0:
yield tuple(current)
else:
# Discard weights that cannot possibly be used
while weights and weights[0] + weights[-1] * (nb_assets - 1) < req_wt:
weights = weights[1:]
while weights and weights[-1] + weights[0] * (nb_assets - 1) > req_wt:
weights = weights[:-1]
# Add all possible weights
for w in weights:
current.append(w)
yield from _make_weight_combs_rec(weights, nb_assets - 1, req_wt - w, current)
current.pop()
def f4():
return list(make_weight_combs(10, 50, 10, 5, 100))
I tested these functions using timeit like this:
print(timeit.timeit('f()', 'from __main__ import f1 as f', number=100))
The results using the parameters from the question:
# min_wt = 10
# max_wt = 50
# step = 10
# nb_assets = 5
0.07021828400320373 # f1 - original code from question
0.041302188008558005 # f1b - code from question using list instead of numpy array
0.009902548001264222 # f2 - calculating the last element of each tuple
0.10601829699589871 # f3 - recursive solution from user kaya3
0.03329997700348031 # f4 - recursive solution from user jdehesa
If I expand the search space (reduced step and increased assets):
# min_wt = 10
# max_wt = 50
# step = 5
# nb_assets = 6
7.6620834979985375 # f1 - original code from question
7.31425816299452 # f1b - code from question using list instead of numpy array
0.809070186005556 # f2 - calculating the last element of each tuple
14.88188026699936 # f3 - recursive solution from user kaya3
0.39385621099791024 # f4 - recursive solution from user jdehesa
Seems like f2 and f4 are the fastest (for the tested size of the data).
Let's generalise this problem; you want to iterate over k-tuples whose sum is n, and whose elements are within range(min_w, max_w+1, w_step). This is a kind of integer partitioning problem, with some extra constraints on the size of the partition and the sizes of its components.
To do this, we can write a recursive generator function; for each w in the range, the remainder of the tuple is a (k - 1)-tuple whose sum is (n - w). The base case is a 0-tuple, which is possible only if the required sum is 0.
As Raymond Hettinger notes, you can also improve the efficiency when k = 1 by just testing whether the required sum is one of the allowed weights.
def constrained_partitions(n, k, min_w, max_w, w_step=1):
if k < 0:
raise ValueError('Number of parts must be at least 0')
elif k == 0:
if n == 0:
yield ()
elif k == 1:
if n in range(min_w, max_w+1, w_step):
yield (n,)
elif min_w*k <= n <= max_w*k:
for w in range(min_w, max_w+1, w_step):
for p in constrained_partitions(n-w, k-1, min_w, max_w, w_step):
yield (w,) + p
Usage:
>>> for p in constrained_partitions(5, 3, 1, 5, 1):
... print(p)
...
(1, 1, 3)
(1, 2, 2)
(1, 3, 1)
(2, 1, 2)
(2, 2, 1)
(3, 1, 1)
>>> len(list(constrained_partitions(100, 5, 10, 50, 10)))
121
Whenever you're iterating over all solutions to some sort of combinatorial problem, it's generally best to generate actual solutions directly, rather than generate more than you need (e.g. with product or combinations_with_replacement) and reject the ones you don't want. For larger inputs, the vast majority of time would be spent generating solutions which will get rejected, due to combinatorial explosion.
Note that if you don't want repeats in different orders (e.g. 1, 1, 3 and 1, 3, 1), you can change the recursive call to constrained_partitions(n-w, k-1, min_w, w, w_step) to only generate partitions where the weights are in non-increasing order.
Note that when you have N weights that sum up to 100, and you chose N - 1 weights, the remaining weight is already defined as 100 - sum of already chosen weights, which should be positive. The same limitation applies to any number of already chosen weights.
Next, you don't want combinations that are just permutations of the same weights. This is why you can order weights by value, and choose the next weight in the combination to be below or equal of the previous one.
This immediately makes the search space much smaller, and you can break a particular branch of search earlier.
Probably writing it with explicit loops first, or as a recursive algorithm, should be much easier for understanding and implementing.

Merge sorting a 2d array

I'm stuck again on trying to make this merge sort work.
Currently, I have a 2d array with a Unix timecode(fig 1) and merge sorting using (fig 2) I am trying to check the first value in each array i.e array[x][0] and then move the whole array depending on array[x][0] value, however, the merge sort creates duplicates of data and deletes other data (fig 3) my question is what am I doing wrong? I know it's the merge sort but cant see the fix.
fig 1
[[1422403200 100]
[1462834800 150]
[1458000000 25]
[1540681200 150]
[1498863600 300]
[1540771200 100]
[1540771200 100]
[1540771200 100]
[1540771200 100]
[1540771200 100]]
fig 2
import numpy as np
def sort(data):
if len(data) > 1:
Mid = len(data) // 2
l = data[:Mid]
r = data[Mid:]
sort(l)
sort(r)
z = 0
x = 0
c = 0
while z < len(l) and x < len(r):
if l[z][0] < r[x][0]:
data[c] = l[z]
z += 1
else:
data[c] = r[x]
x += 1
c += 1
while z < len(l):
data[c] = l[z]
z += 1
c += 1
while x < len(r):
data[c] = r[x]
x += 1
c += 1
print(data, 'done')
unixdate = [1422403200, 1462834800, 1458000000, 1540681200, 1498863600, 1540771200, 1540771200,1540771200, 1540771200, 1540771200]
price=[100, 150, 25, 150, 300, 100, 100, 100, 100, 100]
array = np.column_stack((unixdate, price))
sort(array)
print(array, 'sorted')
fig 3
[[1422403200 100]
[1458000000 25]
[1458000000 25]
[1498863600 300]
[1498863600 300]
[1540771200 100]
[1540771200 100]
[1540771200 100]
[1540771200 100]
[1540771200 100]]
I couldn't spot any mistake in your code.
I have tried your code and I can tell that the problem does not happen, at least with regular Python lists: The function doesn't change the number of occurrence of any element in the list.
data = [
[1422403200, 100],
[1462834800, 150],
[1458000000, 25],
[1540681200, 150],
[1498863600, 300],
[1540771200, 100],
[1540771200, 100],
[1540771200, 100],
[1540771200, 100],
[1540771200, 100],
]
sort(data)
from pprint import pprint
pprint(data)
Output:
[[1422403200, 100],
[1458000000, 25],
[1462834800, 150],
[1498863600, 300],
[1540681200, 150],
[1540771200, 100],
[1540771200, 100],
[1540771200, 100],
[1540771200, 100],
[1540771200, 100]]
Edit, taking into account the numpy context and the use of np.column_stack.
-I expect what happens there is that np.column_stack actually creates a view mapping over the two arrays. To get a real array rather than a link to your existing arrays, you should copy that array:-
array = np.column_stack((unixdate, price)).copy()
Edit 2, taking into account the numpy context
This behavior has actually nothing to do with np.column_stack; np.column_stack already performs a copy.
The reason your code doesn't work is because slicing behaves differently with numpy than with python. Slicing create a view of the array which maps indexes.
The erroneous lines are:
l = data[:Mid]
r = data[Mid:]
Since l and r just map to two pieces of the memory held by data, they are modified when data is. This is why the lines data[c] = l[z] and data[c] = r[x] overwrite values and create copies when moving values.
If data is a numpy array, we want l and r to be copies of data, not just views. This can be achieved using the copy method.
l = data[:Mid]
r = data[Mid:]
if isinstance(data, np.ndarray):
l = l.copy()
r = r.copy()
This way, I tested, the copy works.
Note
If you wanted to sort the data using python lists rather than numpy arrays, the equivalent of np.column_stack in vanilla python is zip:
z = zip([10, 20, 30, 40], [100, 200, 300, 400], [1000, 2000, 3000, 4000])
z
# <zip at 0x7f6ef80ce8c8>
# `zip` creates an iterator, which is ready to give us our entries.
# Iterators can only be walked once, which is not the case of lists.
list(z)
# [(10, 100, 1000), (20, 200, 2000), (30, 300, 3000), (40, 400, 4000)]
The entries are (non-mutable) tuples. If you need the entries to be editable, map list on them:
z = zip([10, 20, 30, 40], [100, 200, 300, 400], [1000, 2000, 3000, 4000])
li = list(map(list, z))
# [[10, 100, 1000], [20, 200, 2000], [30, 300, 3000], [40, 400, 4000]]
To transpose a matrix, use zip(*matrix):
def transpose(matrix):
return list(map(list, zip(*matrix)))
transpose(l)
# [[10, 20, 30, 40], [100, 200, 300, 400], [1000, 2000, 3000, 4000]]
You can also sort a python list li using li.sort(), or sort any iterator (lists are iterators), using sorted(li).
Here, I would use (tested):
sorted(zip(unixdate, price))

Categories

Resources