Huge Numpy list - How to get n digit? - python

Currently i have a huge database of randomly generated numbers in numpy.
array([62051180209, 87882444506, 49821030805, ..., 54840854303,
21222836608, 24070750502])
Now i want to check how many nunmbers have for ex. nunmber 05, 15 digits on position 3 and 4. (ex. 62-05-1180209, like first on my list)
I would like to check how many numberx have other digits in other position. Like position 5, 6. 1st on my list have number 11 for example.

Operations on strings take a lot more CPU and RAM than integers. It is A LOT faster to use integer math instead:
def get_matches(array, start, end, value):
return np.remainder(array // 10**start, 10**(end-start)) == value
Explained:
array // 10**start drops start digits at the end, using whole number division
np.remainder drops everything except end-start trailing digits
== value checks if the value matches. Note that to check if two digits are 05, value should be just 5.

As Random Davis already suggested, this might work:
import numpy as np
mylist = np.array([62011180209, 87882444506, 49821030805, 54840854303,21222836608, 24070750502])
def get_matches(mylist, start, end, value):
value = str(value)
return [str(i)[start:end+1]==value for i in mylist]
get_matches(mylist, start=3, end=4, value=11)
For that list this delivers the following result:
[True, False, False, False, False, False]
If multiple choices should be considered, then with a naive approach, the above function can be rewritten as follows:
def get_matches_multichoice(mylist, start, end, valuelist):
valuelist = [str(value) for value in valuelist]
return [str(i)[start:end+1] in valuelist for i in mylist]
Calling is for the above data example:
print (get_matches_multichoice(mylist, start=3, end=5, valuelist=np.array([111, 824, 408])) )
then returns:
[True, True, False, True, False, False]

Related

What does sum(x%2==0) mean?? (python)

import numpy as np
x = np.array([1, -1, 2, 5, 7])
print(sum(x%2==0))
This is the code, and I can't understand what does ' sum(x%2==0) ' mean.
Does it mean to sum even number?
I'm studying for school test and My professor said output of the above code is 1.
But I can't understand what does ' sum(x%2==0)' mean..
x % 2 == 0 will change your array to [False, False, True, False, False]
Because every element will be converted to a boolean, which represents, if the number is even or odd
Then the sum gets evaluated, where False = 0 and True = 1
0 + 0 + 1 + 0 + 0 = 1
import numpy as np
x = np.array([1, -1, 2, 5, 7])
# step 1: create an intermediate array which contains the modulo 2 of each element (if the element is even it will be True, otherwise False)
y = x % 2 == 0 # [False, False, True, False, False]
# step 2: sum the intermediate array up. In this case the False values count as 0 and the True values as 1. There is one True value so the sum is 1
z = sum(y) # 1
For your purposes, here's an explanation. For Stack Overflow's purposes, I'm recommending to close this question as it's more coding help than a novel coding question.
The operations in this expresssion are as follows:
# operation 1
intermediate_result_1 = x%2
# operation 2
intermediate_result_2 = (intermediate_result_1 == 0)
# operation 3
sum(intermediate_result_2)
Operation 1: the modulo operator essentially returns the remainder when the first term is divided by the second term. Most basic mathematical operations (e.g. +,-,*,/,%,==,!=, etc) are implemented element-wise in numpy, which means that the operation is performed independently on each element in the array. Thus, the output from operation 1:
intermediate_result_1 = np.Array([1,1,0,1,1])
Operation 2: same for the equality operator ==. Each element of the array is compared to the right-hand value, and the resulting array has True (or 1) where the equality expression holds, and False (or 0) otherwise.
intermediate_result_2 = np.Array([0,0,1,0,0])
Operation 3: Lastly, the default sum() operator for a numpy array sums all values in the array. Note that numpy provides its own sum function which allows for summing along individual dimensions. Quite evidently the sum of this array's elements is 1.
numpy makes it easy for you to operate on the array object
as many answers already suggest that
x%2==0 returns [False, False, True, False, False]
but if you are still confused then try to understand it like this
lets make a function which checks if a value is even or not.
def is_even(ele):
return ele%2==0
then we use the map function
map() function returns a map object(which is an iterator) of the
results after applying the given function to each item of a given
iterable (list, tuple etc.)
NOTE: copied from GeeksforGeeks
then we take a simple list and map it with this function like so:
l=[1, -1, 2, 5, 7] # this is not a np array
print(map(is_even, l)) # this prints [False, False, True, False, False]
print(sum(map(is_even, l))) # this prints 1

How to generate a list of lists using probabilities for each element in Python

I want to verify if i am doing things right, the question is :
The function random, without arguments, assumed to be imported, returns a random real of the interval [0,1[. Write a function init(N) taking as input a strictly positive integer, and returning a grid (NxN) (in the form of a list of lists), each cell containing a cell with a value of True with a probability of 1/3. For example:
init(4) => [[True, False, False, False],[False, False, True, True],[False,False,False,False],[True,False, True, False]]
from random import random
def init(N):
return [ [True if random()<1/3 else False for j in range(N)] for i in range (N) ]
print(init(4))
I am confused is the probablity 1/3 means that the generated random value is less than 1/3 or it is referring to a weighted probability such the one used in random.choice

Python - Detect inflection point in list and replace

The title is actually misleading, but I didn't know how to describe my problem in a short sentence. I don't care about inflection point, but I care about the point where the values switch from x > 1 to x < 1.
Consider the following array:
a = np.array([0.683, 0.819, 0.678, 1.189, 1.465, 0.93 , 0.903, 1.321, 1.321, 0.785, 0.875])
# do something... and here's what I want:
np.array([True, False, False, False, False, True, True, False, False, True, True])
Here are the rules:
First point in array is the starting point, and is always marked True
In order for values to be marked True, it must be smaller than 1 (x < 1).
However, even if a value is smaller than 1, if it's between the first value smaller than 1 and the first value greater than 1, mark it as False.
In case my explanation doesn't make sense, here's the picture of what I want to do:
The decimal values in the array a are just ratios: current point / previous point. How can I do this in Python?
the code I put hereafter do what you asked. Unfortunately, it doesn't use list comprehension.
The first thing I did was to write a function that find the indexes of the first value below zero and the first value above zero.
import numpy as np
a = np.array([0.683, 0.819, 0.678, 1.189, 1.465, 0.93 , 0.903, 1.321, 1.321, 0.785, 0.875])
### if a number is below ONE but in a position between the first true below zero and the first false above zero
### then it's false
## find the two indexes of the first value below 1 and the first value above 1
def find_indx(a):
first_min=0
for i in range(len(a)):
if(a[i]<1):
first_min=i
break
first_max=0
for i in range(len(a)):
if(a[i]>1):
first_max=i
break
return([first_min,first_max])
Using this function you can set, to false, the values that are below zero but are in the interval between the first below zero and the first above zero.
The two indexes are stored in "false_range".
Once you have that it's quite easy. The first point is always true.
If the indexes are between the "false_range" and below zero they become false.
If the points are outside the "false_range" their value depends if they are above 1 or below.
false_range=find_indx(a)
truth_list=[]
for i in range(len(a)):
## the first value is always true
if(i==0):
truth_list.append(True)
else:
## if the index is between the false_range and
## this value is below 1 assign False
if(i>false_range[0] and i<false_range[1] and a[i]<1):
truth_list.append(False)
## in all the other cases it depends only if the value is below or above zero
elif(a[i]>1):
truth_list.append(False)
elif(a[i]<1):
truth_list.append(True)
print(truth_list)
[True, False, False, False, False, True, True, False, False, True, True]
The printed list correspond to the one you gave, but please, test this solution before using it.

representing recurrence by chaining iterables in Python

I'm solving a problem where I have levels in a binary tree. I'm given a level, then a position.
The second level is [True, False].
The third level is [True, False, False, True].
The fourth [True, False, False, True, False, True, True, False], and so on.
To solve the problem, I may need to calculate this sequence out many times to get the element at a given position at that level.
For the initial array pattern = [True, False]
I want to do something like:
for _ in range(level):
pattern = pattern + [not elem for elem in pattern]
Obviously for large limits this is not working well for me. My attempts at a solution using the chain method from itertools has so far been fruitless. What is a memory efficient way to express this in Python?
EDIT
This did what I was looking for, but still did not meet the runtime requirements I was looking for.
for _ in range(level):
lsb, msb = tee(pattern)
pattern = chain(lsb, map(lambda x: not x, msb))
Ultimately, the sol'n involved finding the global index of the target element in question, and determining how many 'right' paths were taken from the root (base case = 1) to get to it, observing that the state from the parent to child does not change if a left path was taken, but flips if a right path was taken. It appears that most of the clever soln's are some spin on this fact.
What is a memory efficient way to express this in Python?
Since the approach you're using doubles the memory needed on each iteration, it won't scale easily. It may be better to find an analytic approach.
The generator below takes O(1) time to generate each element. And, crucially, calculating the next value depends only on the index and the previous value.
def gen():
yield True
n, prev = 1, 1
while True:
x = n ^ n - 1
y = x ^ x >> 1
if y.bit_length() % 2:
z = 1 - prev
else:
z = prev
yield bool(z)
prev = z
n += 1
A recurrence relation like this allows to compute elements in constant memory. Implementing the idea with cython or pypy should increase performance significantly.
Trying to generate elements one by one is a bad idea, and saving them all is even worse. You only need one element's value, and you can compute it directly.
Suppose the element you want is at index 2**i + k, where k < 2**i. Then this element is the negation of the element at index k, and the element at index k can be computed the same way. You end up negating element 0 once for each set bit in your desired index's binary representation. If there are an even number of set bits, the value is True. Otherwise, the value is False.

Finding a 'spike' or drop in a dataset programatically

If I have a dataset that looks like this
[0.523,0.445,0.558,0.492,0.440,0.502,0.742,0.802,0.821,0.811,0.804,0.860]
As you can see, there is a 'spike' in the values after 0.502. Is there a way to find this programmatically in Python?
I'm already using Numpy and Scipy; I'm sure those libraries contain something like this. I just don't know what this procedure is called.
An added bonus would be to adjust the 'sensitivity' of detecting a spike or drop, since the dataset can be quite noise. A spike would mean a sustained increase in the moving averages of the values, and a drop would mean a sustained decrease values.
The range of each value is [-1,1]. The number of values in the array would be 50-100.
I would recommend using the diff function of numpy:
import numpy
a = [0.523,0.445,0.558,0.492,0.440,0.502,0.742,0.802,0.821,0.811,0.804,0.860]
numpy.diff(a)
This would give you:
array([-0.078, 0.113, -0.066, -0.052, 0.062, 0.24 , 0.06 , 0.019,
-0.01 , -0.007, 0.056])
If the number is positive, then it's a jump up, if it's negative, then it's a jump down.
If you just want to find where there are spikes, up or down try this:
abs(numpy.diff(a)) > 0.2
Adjusting the 0.2 up or down would make it less or more sensitive, respectively. This would give:
array([False, False, False, False, False, True, False, False, False,
False, False], dtype=bool)
It's fairly simple to find where the difference of two adjacent values in a sequence differs by a threshold:
def findSpikes(data, threshold=0.2):
prev = None
for i, v in enumerate(data):
if prev is None:
prev = v
continue
delta = abs(v - prev)
if delta >= threshold:
print("Found spike at index %d (value %f)" % (i, v))
prev = v
For your sample data, it will print:
Found spike at index 6 (value 0.742000)
It's easy to convert the function to a generator; change the print line to yield i, v or something similar.

Categories

Resources