Unable to Decrypt Python Algorithm [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I just started learning python and though the best way would be to solve few easy algorithm questions. I came across the question:
A strange grid has been recovered from an old book. It has 5 columns and infinite number of rows. The bottom row is considered as the first row. First few rows of the grid are like this:
..............
..............
20 22 24 26 28
11 13 15 17 19
10 12 14 16 18
1 3 5 7 9
0 2 4 6 8
The grid grows upwards forever!
Rows are indexed from bottom to top and columns are indexed from left to right.
The task is to find the integer in cth column in rth row of the grid.
Example:
Input: 6 3
Output: 25
The number in the 6th row and 3rd column is 25.
The solution for this problem was:
import sys
for line in sys.stdin:
r = int(line.split(' ')[0])
c = int(line.split(' ')[1])
if r%2 == 1:
print ((r-1)/2)*10 + (c-1)*2
else:
print ((r-1)/2)*10 + (c-1)*2 + 1
I dint understand why are we taking r%2 == 1 and why are we using ((c-1)*2)+1)+(((r-1)/2)

Look at just the odd-numbered rows. In column one the values are 0, 10, 20. In column two the values are 2, 12, 22. In column three, 4, 14, 24.
Now look at just the even-numbered rows. In column one the values are 1, 11, 21. Column two: 3, 13, 23. Column three: 5, 15, 25.
Do you see how as you move up the rows, the value increases by ten? Not on every row, but rather on every other row? This is why we have ((r-1)/2)*10 - first we round r down to the nearest multiple of 2, then we multiply by 10. This gives us the value in the tens place.
Look again at the odd-numbered rows. In row one, the values are 0, 2, 4, 6, 8. In row three: 10, 12, 14, 16, 18. Row five: 20, 22, 24, 26, 28.
Now back to the even-numbered rows. In row two we have 1, 3, 5, 7, 9. Row four: 11, 13, 15, 17, 19.
Do you see how in the rows, the values of the ones digits are increasing by two? In the case of the odd-numbered rows, they are the even numbers. In the even-numbered rows, they are the odds. This is why we have if r%2 == 1: to check if we are dealing with an odd or even row in order to handle this branching behavior.
If the r is odd, we calculate the c-1th multiple of 2 - this is (c-1)*2. On the other hand, if r is even, we calculate the c-1th multiple of 2, plus 1 (thus making the value odd). (c-1)*2 + 1.
Since the value generated by knowing the row number describes the tens digit, and the value generated by knowing the column number describes the ones digit, we can just add these two values together. That is ((r-1)/2)*10 + (c-1)*2 in the case where r is odd and ((r-1)/2)*10 + (c-1)*2 + 1 in the case where r is even.
Thanks John for the Edit suggestion!

Related

Printing the number of different numbers in python

I would like to ask a question please regarding printing the number of different numbers in python.
for example:
Let us say that I have the following list:
X = [5, 5, 5]
Since here we have only one number, I want to build a code that can recognize that we have only one number here so the output must be:
1
The number is: 5
Let us say that I have the following list:
X = [5,4,5]
Since here we have two numbers (5 and 4), I want to the code to recognize that we have only two numbers here so the output must be:
2
The numbers are: 4, 5
Let us say that I have the following list:
X = [24,24,24,24,24,24,24,24,26,26,26,26,26,26,26,26]
Since here we have two numbers (24 and 26), I want to the code to recognize that we have only two numbers here so the output must be:
2
The numbers are: 24, 26
You could keep track of unique numbers with a set object:
X = [1,2,3,3,3]
S = set(X)
n = len(S)
print(n, S) # 3 {1,2,3}
Bear in mind sets are unordered, so you would need to convert back to a list and sort them if needed.
you can change this list into set, it will remove duplicate, then you can change it again into list.
list(set(X))
You can try numpy.unique, and use len() on the result
May I ask you please if we can use set() to read the data in a specific column in pandas?
For example, I have the following the DataFrame:
df1= [ 0 -10 2 5
1 24 5 10
2 30 3 6
3 30 2 1
4 30 4 5 ]
where the first column is the index..
I tried first to isolate the second column
[-10
24
30
30
30]
using the following: x = pd.DataFrame(df1, coulmn=[0])
Then, I transposed the column using the following XX = x.T
Then, I used set() function.
However, instead of obtaining
[-10 24 30]
I got the following [0 1 2 3 4]
So set() read the index instead of reading the first column

Explosion of memory when using pandas .loc with umatching indices + assignment giving duplicate axis error

This is an observation from Most pythonic way to concatenate pandas cells with conditions
I am not able to understand why third solution one takes more memory compared to first one.
If I don't sample the third solution does not give runtime error, clearly something is weird
To emulate large dataframe I tried to resample, but never expected to run into this kind of error
Background
Pretty self explanatory, one line, looks pythonic
df['city'] + (df['city'] == 'paris')*('_' + df['arr'].astype(str))
s = """city,arr,final_target
paris,11,paris_11
paris,12,paris_12
dallas,22,dallas
miami,15,miami
paris,16,paris_16"""
import pandas as pd
import io
df = pd.read_csv(io.StringIO(s)).sample(1000000, replace=True)
df
Speeds
%%timeit
df['city'] + (df['city'] == 'paris')*('_' + df['arr'].astype(str))
# 877 ms ± 19.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
df['final_target'] = np.where(df['city'].eq('paris'),
df['city'] + '_' + df['arr'].astype(str),
df['city'])
# 874 ms ± 19.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
If I dont sample, there is no error and output also match exactly
Error(Updated)(Only happens when I sample from dataframe)
%%timeit
df['final_target'] = df['city']
df.loc[df['city'] == 'paris', 'final_target'] += '_' + df['arr'].astype(str)
MemoryError: Unable to allocate 892. GiB for an array with shape (119671145392,) and data type int64
For smaller input(sample size 100) we get different error, telling a problem due to different sizes, but whats up with memory allocations and sampling?
ValueError: cannot reindex from a duplicate axis
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-5-57c5b10090b2> in <module>
1 df['final_target'] = df['city']
----> 2 df.loc[df['city'] == 'paris', 'final_target'] += '_' + df['arr'].astype(str)
~/anaconda3/lib/python3.8/site-packages/pandas/core/ops/methods.py in f(self, other)
99 # we are updating inplace so we want to ignore is_copy
100 self._update_inplace(
--> 101 result.reindex_like(self, copy=False), verify_is_copy=False
102 )
103
I rerun them from scratch each time
Update
This is part of what I figured
s = """city,arr,final_target
paris,11,paris_11
paris,12,paris_12
dallas,22,dallas
miami,15,miami
paris,16,paris_16"""
import pandas as pd
import io
df = pd.read_csv(io.StringIO(s)).sample(10, replace=True)
df
city arr final_target
1 paris 12 paris_12
0 paris 11 paris_11
2 dallas 22 dallas
2 dallas 22 dallas
3 miami 15 miami
3 miami 15 miami
2 dallas 22 dallas
1 paris 12 paris_12
0 paris 11 paris_11
3 miami 15 miami
Indices are repeated when sampled with replacement
So resetting the indices resolved the problem even if df.arr and df.loc have essentially different sizes or replacing with df.loc[df['city'] == 'paris', 'arr'].astype(str) will solve it. Just as 2e0byo pointed out.
Still can someone explain how .loc works and also explosion of memory When indices have duplicates in them and don't match?!
#2e0byo hit the nail on the head saying pandas' algorithm is "inefficient" in this case.
As far as .loc, it's not really doing anything remarkable. Its use here is analogous to indexing a numpy array with a boolean array of the same shape, with an added dict-key-like access to a specific column - that is, df['city'] == 'paris' is itself a dataframe, with the same number of rows and the same indexes as df, with a single column of boolean values. df.loc[df['city'] == 'paris'] then gives a dataframe consisting of only the rows that are true in df['city'] == 'paris' (that have 'paris' in the 'city' column). Adding the additional argument 'final_target' then just returns only the 'final_target' column of those rows, instead of all three (and because it only has one column, it's technically a Series object - the same goes for df['arr']).
The memory explosion happens when pandas actually tries to add the two Series. As #2e0byo pointed out, it has to reshape the Series to do this, and it does this by calling the first Series' align() method. During the align operation, the function pandas.core.reshape.merge.get_join_indexers() calls pandas._libs.join.full_outer_join() (line 155) with three arguments: left, right, and max_groups (point of clarification: these are their names inside the function full_outer_join). left and right are integer arrays containing the indexes of the two Series objects (the values in the index column), and max_groups is the maximum number of unique elements in either left or right (in our case, that's five, corresponding to the five original rows in s).
full_outer_join immediately turns and calls pandas._libs.algos.groupsort_indexer() (line 194), once with left and max_groups as arguments and once with right and max_groups. groupsort_indexer returns two arrays - generically, indexer and counts (for the invocation with left, these are called left_sorter and left_count, and correspondingly for right). counts has length max_groups + 1, and each element (excepting the first one, which is unused) contains the count of how many times the corresponding index group appears in the input array. So for our case, with max_groups = 5, the count arrays have shape (6,), and elements 1-5 represent the number of times the 5 unique index values appear in left and right.
The other array, indexer, is constructed so that indexing the original input array with it returns all the elements grouped in ascending order - hence "sorter." After having done this for both left and right, full_outer_join chops up the two sorters and strings them up across from each other. full_outer_join returns two arrays of the same size, left_idx and right_idx - these are the arrays that get really big and throw the error. The order of elements in the sorters determines the order they appear in the final two output arrays, and the count arrays determine how often each one appears. Since left goes first, its elements stay together - in left_idx, the first left_count[1] elements in left_sorter are repeated right_count[1] times each (aaabbbccc...). At the same place in right_idx, the first right_count[1] elements are repeated in a row left_count[1] times (abcabcabc...). (Conveniently, since the 0 row in s is a 'paris' row, left_count[1] and right_count[1] are always equal, so you get x amount of repeats x amount of times to start off). Then the next left_count[2] elements of left_sorter are repeated right_count[2] times, and so on... If any of the counts elements are zero, the corresponding spots in the idx arrays are filled with -1, to be masked later (as in, right_count[i] = 0 means elements in right_idx are -1, and vice versa - this is always the case for left_count[3] and left_count[4], because rows 2 and 3 in s are non-'paris').
In the end, the _idx arrays have an amount of elements equal to N_elements, which can be calculated as follows:
left_nonzero = (left_count[1:] != 0)
right_nonzero = (right_count[1:] != 0)
left_repeats = left_count[1:]*left_nonzero + np.ones(len(left_counts)-1)*(1 - left_nonzero)
right_repeats = right_count[1:]*right_nonzero + np.ones(len(right_counts)-1)*(1 - right_nonzero)
N_elements = sum(left_repeats*right_repeats)
The corresponding elements of the count arrays are multiplied together (with all the zeros replaced with ones), and added together to get N_elements.
You can see this figure grows pretty quickly (O(n^2)). For an original dataframe with 1,000,000 sampled rows, each one appearing about equally, then the count arrays look something like:
left_count = array([0, 2e5, 2e5, 0, 0, 2e5])
right_count = array([0, 2e5, 2e5, 2e5, 2e5, 2e5])
for a total length of about 1.2e11. In general for an initial sample N (df = pd.read_csv(io.StringIO(s)).sample(N, replace=True)), the final size is approximately 0.12*N**2
An Example
It's probably helpful to look at a small example to see what full_outer_join and groupsort_indexer are trying to do when they make those ginormous arrays. We'll start with a small sample of only 10 rows, and follow the various arrays to the final output, left_idx and right_idx. We'll start by defining the initial dataframe:
df = pd.read_csv(io.StringIO(s)).sample(10, replace=True)
df['final_target'] = df['city'] # this line doesn't change much, but meh
which looks like:
city arr final_target
3 miami 15 miami
1 paris 11 paris
0 paris 12 paris
0 paris 12 paris
0 paris 12 paris
1 paris 11 paris
2 dallas 22 dallas
3 miami 15 miami
2 dallas 22 dallas
4 paris 16 paris
df.loc[df['city'] == 'paris', 'final_target'] looks like:
1 paris
0 paris
0 paris
0 paris
1 paris
4 paris
and df['arr'].astype(str):
3 15
1 11
0 12
0 12
0 12
1 11
2 22
3 15
2 22
4 16
Then, in the call to full_outer_join, our arguments look like:
left = array([1,0,0,0,1,4]) # indexes of df.loc[df['city'] == 'paris', 'final_target']
right = array([3,1,0,0,0,1,2,3,2,4]) # indexes of df['arr'].astype(str)
max_groups = 5 # the max number of unique elements in either left or right
The function call groupsort_indexer(left, max_groups) returns the following two arrays:
left_sorter = array([1, 2, 3, 0, 4, 5])
left_count = array([0, 3, 2, 0, 0, 1])
left_count holds the number of appearances of each unique value in left - the first element is unused, but then there a 3 zeros, 2 ones, 0 twos, 0 threes, and 1 four in left.
left_sorter is such that left[left_sorter] = array([0, 0, 0, 1, 1, 4]) - all in order.
Now right: groupsort_indexer(right, max_groups) returns
right_sorter = array([2, 3, 4, 1, 5, 6, 8, 0, 7, 9])
right_count = array([0, 3, 2, 2, 2, 1])
Once again, right_count contains the number of times each count appears: the unused first element, and then 3 zeros, 2 ones, 2 twos, 2 threes, and 1 four (note that elements 1, 2, and 5 of both count arrays are the same: these are the rows in s with 'city' = 'paris'). Also, right[right_sorter] = array([0, 0, 0, 1, 1, 2, 2, 3, 3, 4])
With both count arrays calculated, we can calculate what size the idx arrays will be (a bit simpler with actual numbers than with the formula above):
N_total = 3*3 + 2*2 + 2 + 2 + 1*1 = 18
3 is element 1 for both counts arrays, so we can expect something like [1,1,1,2,2,2,3,3,3] to start left_idx, since [1,2,3] starts left_sorter, and [2,3,4,2,3,4,2,3,4] to start right_idx, since right_sorter begins with [2,3,4]. Then we have twos, so [0,0,4,4] for left_idx and [1,5,1,5] for right_idx. Then left_count has two zeros, and right_count has two twos, so next go 4 -1's in left_idx and the next four elements in right_sorter go into right_idx: [6,8,0,7]. Both count's finish with a one, so one each of the last elements in the sorters go in the idx: 5 for left_idx and 9 for right_idx, leaving:
left_idx = array([1, 1, 1, 2, 2, 2, 3, 3, 3, 0, 0, 4, 4,-1, -1, -1, -1, 5])
right_idx = array([2, 3, 4, 2, 3, 4, 2, 3, 4, 1, 5, 1, 5, 6, 8, 0 , 7, 9])
which is indeed 18 elements.
With both index arrays the same shape, pandas can construct two Series of the same shape from our original ones to do any operations it needs to, and then it can mask these arrays to get back sorted indexes. Using a simple boolean filter to look at how we just sorted left and right with the outputs, we get:
left[left_idx[left_idx != -1]] = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 4])
right[right_idx[right_idx != -1]] = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 3, 3, 4])
After going back up through all the function calls and modules, the result of the addition at this point is:
0 paris_12
0 paris_12
0 paris_12
0 paris_12
0 paris_12
0 paris_12
0 paris_12
0 paris_12
0 paris_12
1 paris_11
1 paris_11
1 paris_11
1 paris_11
2 NaN
2 NaN
3 NaN
3 NaN
4 paris_16
which is result in the line result = op(self, other) in pandas.core.generic.NDFrame._inplace_method (line 11066), with op = pandas.core.series.Series.__add__ and self and other the two Series from before that we're adding.
So, as far as I can tell, pandas basically tries to perform the operation for every combination of identically-indexed rows (like, any and all rows with index 1 in the first Series should be operated with all rows index 1 in the other Series). If one of the Series has indexes that the other one doesn't, those rows get masked out. It just so happens in this case that every row with the same index is identical. It works (albeit redundantly) as long as you don't need to do anything in place - the trouble for the small dataframes arises after this when pandas tries to reindex this result back into the shape of the original dataframe df.
The split (the line that smaller dataframes make it past, but larger ones don't) is that line result = op(self, other) from above. Later in the same function (called, note, _inplace_method), the program exits at self._update_inplace(result.reindex_like(self, copy=False), verify_is_copy=False). It tries to reindex result so it looks like self, so it can replace self with result (self is the original Series, the first one in the addition, df.loc[df['city'] == 'paris', 'final_target']). And this is where the smaller case fails, because, obviously, result has a bunch of repeated indexes, and pandas doesn't want to lose any information when it deletes some of them.
One Last Thing
It's probably worth mentioning that this behaviour isn't particular to the addition operation here. It happens any time you try an arithmetic operation on two large dataframes with a lot of repeated indexes - for example, try just defining a second dataframe the exact same way as the first, df2 = pd.read_csv(io.StringIO(s)).sample(1000000, replace=True), and then try running df.arr*df2.arr. You'll get the same memory error.
Interestingly, logical and comparison operators have protections against doing this - they require identical indexes, and check for it before calling their operator method.
I did all my stuff in pandas 1.2.4, python 3.7.10, but I've given links to the pandas Github, which is currently in version 1.3.3. As far as I can tell, the differences don't affect the results.
I could certainly be wrong about this, but isn't it because df["arr"] has a different shape from df.loc[df["city"] == "paris"]? So something funny is happening in Pandas' internal resampling.
If I explicitly truncate the dataframe myself it works:
df['final_target'] = df['city']
df.loc[df['city'] == 'paris', 'final_target'] += "_" + df.loc[df['city'] == 'paris', 'arr'].astype(str)
In which case, the answer would be 'because internally pandas has an algorithm for reshaping dataframes when adding different sizes which is inefficient in this case'.
I don't know if that qualifies as an answer as I've not looked more deeply into pandas.

Find the minimum difference of the 2 elements from the remaining array while iterating python

I am iterating through the array and every time I would like to find 2 elements with the minimum difference from the remaining array.
e.g given array=[5,3,6,1,3], if iterator i is at index 2 so array[i]=6, I would like to find the minimum difference that 2 elements from the array excluding 6 could give.
I was thinking of finding first and second maximum elements for every iteration, but I do not know how to ignore array[i] element. Any tips would be appreciated.
On the outer loop, track the index using enumerate(). Then on the inner loop which iterates the rest of the items (to get the minimum difference), also track the index so that we can skip it if it is equal to the outer loop's index.
Here are some solutions for you. We don't need to get all the differences of all pairs of numbers as that would result to a factorial time complexity (since we will get all possible combinations/pairs). What we can do is simply sort the array, then once sorted
We only need to get the difference of each consecutive number e.g. [1, 3, 10, 11, 12] we just need to subtract 3 - 1, 10 - 3, 11 - 10, and 12 - 11. There is no more point of doing e.g. 12 - 1 because we are sure that it would be greater than any of the consecutive pairs.
Aside from consecutive pairs, we also need to get alternating pairs so that if we removed a number, we will still consider the difference of its previous and next e.g. [10, 12, 14]. If we are at item 12, then 12 -10 and 14 - 12 shouldn't be considered. But 14 - 10 should be!
Solution 1
A bit complicated, but is only O(n log n) in time complexity.
Sort the array. The sorted values must contain the original indices.
Store the differences in sorted order, but keep it at a maximum of 3 items only wherein those 3 items are the least differences (same idea with bounded min heaps).
Why 3 items? Say for example we have [1, 10, 12, 14, 100]. Then we know that the minimum difference is 2 which is the result of 12 - 10 and 14 - 12. For item 1, the min diff is 2, same with items 10, 14, and 100. But for 12, it shouldn't be 2 because if we remove 12, the next min diff is 14 - 10 which is 4. This would be the worst case. So we need to store maximum of 3 minimum differences, which here will be 2 from 12 - 10, 2 from 14 - 12, and 4 from 14 - 10 so that we can catch the case for 12 which should pick the third option (4 from 14 - 10).
Iterate the original array. For each item, see the first applicable difference and display it. This would be the difference that wasn't a result of using the current item in the subtraction.
from bisect import insort
numbers = [14, 10, -11, 27, 12, 4, 20]
numbers_sorted = sorted(enumerate(numbers), key=lambda value: value[1]) # O(n log n)
differences = []
for index in range(1, len(numbers_sorted)): # O(n), the binary search and pop on <differences> are negligible because it is fixed at the constant size of 3
for prev in range(1, 2 if index == 1 else 3): # Subtract consecutive and alternating
diff_tup = (
numbers_sorted[index][1] - numbers_sorted[index-prev][1],
numbers_sorted[index-prev],
numbers_sorted[index],
)
insort(differences, diff_tup)
if len(differences) > 3:
differences.pop()
for index, num in enumerate(numbers): # O(n), the iteration of <differences> is negligible because it is fixed at the constant size of 3
for diff in differences:
if index != diff[1][0] and index != diff[2][0]:
print(f"{num}: min diff {diff[0]} from {diff[1][1]} and {diff[2][1]}")
break
Solution 2
More straight-forward, but is O(n ^ 2) in time complexity.
Sort the array. The sorted values must contain the original indices.
Iterate the array in the primary loop.
For each item, iterate the sorted array.
Skip if the item if it is the one in the primary loop.
Otherwise, subtract the numbers.
If it is less than the current minimum, set it as the new minimum.
Display the minimum for the current number
from bisect import insort
numbers = [14, 10, -11, 27, 12, 4, 20]
numbers_sorted = sorted(enumerate(numbers), key=lambda value: value[1]) # O(n log n)
for num_index, num in enumerate(numbers): # O(n ^ 2)
min_diff = None
min_subtractors = None
for index in range(1, len(numbers_sorted)):
for prev in range(1, 2 if index == 1 else 3): # Subtract consecutive and alternating
if num_index == numbers_sorted[index][0] or num_index == numbers_sorted[index-prev][0]:
continue
diff = numbers_sorted[index][1] - numbers_sorted[index-prev][1]
if min_diff is None or diff < min_diff:
min_diff = diff
min_subtractors = (numbers_sorted[index-prev][1], numbers_sorted[index][1])
print(f"{num}: min diff {min_diff} from {min_subtractors[0]} and {min_subtractors[1]}")
Output
14: min diff 2 from 10 and 12
10: min diff 2 from 12 and 14
-11: min diff 2 from 10 and 12
27: min diff 2 from 10 and 12
12: min diff 4 from 10 and 14
4: min diff 2 from 10 and 12
20: min diff 2 from 10 and 12

How to construct a rank array with numpy? (What is a rank array?)

I hope all of you are having a great day. In my python class, we are learning how to use Numpy, so we got an assignment about that. My question is this: What is a rank array and how can I construct that with using python? My instructor tried to explain that with these lines but I did not understand anything actually :(
These are the instructions:
rank_calculator(A) - 5 pts
Given a numpy ndarray A, return its rank array.
Input: [[ 9 4 15 0 18]
[16 19 8 10 1]]
Return value: [[4 2 6 0 8]
[7 9 3 5 1]]
The return value should be an ndarray of the same size and shape as the original array A.
So, can someone explain that? I am not so good at Python, unfortunately :(
You can use numpy.argsort multiple times to handle a matrix, as suggested in this answer on SO.
import numpy as np
inp = np.array([[9,4,15,0,18],
[16,19,8,10,1]])
inp.ravel().argsort().argsort().reshape(inp.shape)
array([[4, 2, 6, 0, 8],
[7, 9, 3, 5, 1]])
What is a rank matrix?
In summary, if I were to take all the integers in the matrix, and sort them smallest to largest, then assign each one a rank from 0 to 9, that would result in the rank matrix. Notice that the smallest is 0 which gets a rank of 0, while largest is 19, which gets the last rank of 9.
How the double argsort works
#printing them so they align nicely
print('Array ->', end='')
for i in inp.ravel().astype('str'):
print(i.center(4), end='')
print('\n')
print('Sort1 ->', end='')
for i in inp.ravel().argsort().astype('str'):
print(i.center(4), end='')
print('\n')
print('Sort2 ->', end='')
for i in inp.ravel().argsort().argsort().astype('str'):
print(i.center(4), end='')
Array -> 9 4 15 0 18 16 19 8 10 1
Sort1 -> 3 9 1 7 0 8 2 5 4 6
Sort2 -> 4 2 6 0 8 7 9 3 5 1
Let's first summarize what argsort does. It takes the position of each element and puts them where they belong after sorting. Knowing this, we can write a backward logic which is sort of triangular in nature. Lets start from sort2, then sort1 and then array.
0th (in sort2) is 4th (in sort1), 4th (in sort1) is 0th (in array). So 0th (in array) is 0th (in sort2)
9th (in sort2) is 1st (in sort1), 1st (in sort1) is 9th (in array). So, 9th (in array) is 9th (in sort2)
6th (in sort2) is 9th (in sort1), 9th (in sort1) is 6th (in array). So, 6th (in array) is 6th (in sort2)
Its a bit confusing to wrap your head around it, but once you can understand how argsort() works, you shouldn't have a problem.
Q) What is a rank array?
Ans: It's basically the elements in their sorted order.
Basically what your teacher is asking you is to return each elements positions if they were sorted in ascending order.
CODE:
import numpy as np
A = np.array([[9, 4, 15, 0, 18],
[16, 19, 8, 10, 1]])
flatA = A.flatten()
sorted_flatA = sorted(flatA) # will become -> [0, 1, 4, 8, 9, 10, 15, 16, 18, 19]
# Using a 'MAP' to map the values of sorted_faltA to the index of sorted_faltA.
MAP = {}
for i in range(len(sorted_flatA)):
MAP[sorted_flatA[i]] = i
# Then simply going through the 2D array snd replacing the with their ranks.
res = np.zeros(A.shape)
for i in range(A.shape[0]):
for j in range(A.shape[1]):
res[i][j] = MAP[A[i][j]]
print(res)

Creating columns with numpy Python

I have some elements stored in numpy.array[]. I wish to store them in a ".txt" file. The case is it needs to fit a certain standard, which means each element needs to be stored x lines into the file.
Example:
numpy.array[0] needs to start in line 1, col 26.
numpy.array[1] needs to start in line 1, col 34.
I use numpy.savetxt() to save the arrays to file.
Later I will implement this in a loop to create a lagre ".txt" file with coordinates.
Edit: This good example was provided below, it does point out my struggle:
In [117]: np.savetxt('test.txt',A.T,'%20d %10d')
In [118]: cat test.txt
0 6
1 7
2 8
3 9
4 10
5 11
The fmt option '%20d %10d' gives you spacing which depend on the last integer. What I need is an option which lets me set the spacing from the left side regardless of other integers.
Template is need to fit integers into:
XXXXXXXX.XXX YYYYYYY.YYY ZZZZ.ZZZ
Final Edit:
I solved it by creating a test which checks how many spaces the last float used. I was then able to predict the number of spaces the next float needed to fit the template.
Have you played with the fmt of np.savetxt?
Let me illustrate with a concrete example (the sort that you should have given us)
Make a 2 row array:
In [111]: A=np.arange((12)).reshape(2,6)
In [112]: A
Out[112]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]])
Save it, and get 2 rows, 6 columns
In [113]: np.savetxt('test.txt',A,'%d')
In [114]: cat test.txt
0 1 2 3 4 5
6 7 8 9 10 11
save its transpose, and get 6 rows, 2 columns
In [115]: np.savetxt('test.txt',A.T,'%d')
In [116]: cat test.txt
0 6
1 7
2 8
3 9
4 10
5 11
Put more detail into fmt to space out the columns
In [117]: np.savetxt('test.txt',A.T,'%20d %10d')
In [118]: cat test.txt
0 6
1 7
2 8
3 9
4 10
5 11
I think you can figure out how to make a fmt string that puts your numbers in the correct columns (join 26 spaces etc, or use left and right justification - the usual Python formatting issues).
savetxt also takes an opened file. So you can open a file for writing, write one array, add some filler lines, and write another. Also, savetxt doesn't do anything fancy. It just iterates through the rows of the array, and writes each row to a line, e.g.
for row in A:
file.write(fmt % tuple(row))
So if you don't like the control that savetxt gives you, write the file directly.

Categories

Resources