How to eliminate suspicious barcode (like 123456) data [closed]

How to eliminate suspicious barcode (like 123456) data [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Here's some bar code data from a pandas database
737318 Sikat Botol Pigeon 4902508045506 75170
737379 Natur Manual Breast Pump 8850851860016 75170
738753 Sunlight 1232131321313 75261
739287 Bodymist bodyshop 1122334455667 75296
739677 Bodymist ale 1234567890123 75367
I want to remove data that is suspicious (i.e. has too many repeated or successive digits) like 1232131321313 , 1122334455667, 1234567890123, etc. I am very tolerant of false negatives, but want to avoid false positives (bad bar codes) as much as possible.

If you're worried about repeated and successive digits, you can take np.diff of the digits and then compare against a triangular distribution using a Kolmogorov Smirnov test. The difference between successive digits for a random number should follow a triangular distribution between -10 and 10, with a maximum at 0
import scipy.stats as stat
t = stat.triang(.5, loc = -10, scale = 20)
Turning the bar codes into an array:
a = np.array(list(map(list, map(str, a))), dtype = int) # however you get `a` out of your dataframe
then build a mask with
np.array[stat.kstest(i, t.cdf).pvalue > .5 for i in np.diff(a, axis = 1)]
testing:
np.array([stat.kstest(j, t.cdf).pvalue > .5 for j in np.diff(np.random.randint(0, 10, (1000, 13)), axis = 1)]).sum()
Out: 720
You'll have about a 30% false negative rate, but a p-value threshold of .5 should pretty much guarantee that the values you keep don't have too many successive or repeat digits. If you want to really be sure you've eliminate anything suspicious, you may want to also KS test the actual digits against stat.uniform(scale = 10) (to eliminate 1213141516171 and similar).

As a first step I would use the barcodes built in validation mechanism, the checksum. As your barcodes appear to be GTIN barcodes (specifically GTIN-13), you can use this method:
>>> import math
>>> def CheckBarcode(s):
sum = 0
for i in range(len(s[:-1])):
sum += int(s[i]) * ((i%2)*2+1)
return math.ceil(sum/10)*10-sum == int(s[-1])
>>> CheckBarcode("4902508045506")
True
>>> CheckBarcode("8850851860016")
True
>>> CheckBarcode("1232131321313")
True
>>> CheckBarcode("1122334455667")
False
>>> CheckBarcode("1234567890123")
False

Related

i need more decimal places for pi calculation [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I'm trying to make a Pi calculator in python but I need more decimal places.
it would help a lot if someone edited my code and carefully explained what they did.
this is the code I'm using.
import math
d = 0
ans = 0
display = 0
while True:
display += 1
d += 1
ans += 1/d**2
if display == 1000000:
print(math.sqrt(ans*6))
display = 0
# displays value calculated every 1m iterations
output after ~85m iterations: (3.14159264498239)
I need more than 15 decimal places (3.14159264498239........)

You’re using a very slowly converging series for π²∕6, so you are not going to get a very precise value this way. Floating point limitations prevent further progress after 3.14159264498239, but you’re not going to get much further in any reasonable amount of time, anyway. You can get around these issues by some combination of
micro-optimising your code,
storing a list of values, reversing it and using math.fsum,
using decimal.Decimal,
using a better series (like this one),
using a method that converges to the value of π quickly, instead of a series (like this one),
using PyPy, or a faster language than Python,
from math import pi.

you could try with a generator:
def oddnumbers():
n = 1
while True:
yield n
n += 2
def pi_series():
odds = oddnumbers()
approximation = 0
while True:
approximation += (4 / next(odds))
yield approximation
approximation -= (4 / next(odds))
yield approximation
approx_pi = pi_series()
for x in range(10000000):
print(next(approx_pi))

How to return the fractional part of a number? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
How can I get the fractional part of a number?
For example, I have a list of floats num = [12.73, 9.45] and want to get only the numbers after the decimal point, 73 and 45 in this case. How do I go about doing this?

One approach is using pure(ish) maths.
The short answer:
num = [12.73, 9.45]
[int((f % 1)*100) for f in num]
>>> [73, 44]
Explanation:
The modulo operator returns the remainder once whole division is complete (to over-simplify).
Therefore this, returns the decimal value; the fractional part of the number.
12.73 % 1
>>> 0.7300000000000004
To get the decimal value as a integer, you can use:
int((12.73 % 1)*100)
>>> 73
Just wrap this in a loop for all required values ... and you have the 'short answer' above.

num = [12.73, 9.45];
result = list(map(lambda x: int(str(x).split('.')[1]),num))
print(result)

and want to get only the numbers after the period,
There is no such thing. Numbers don't have digits; the string representation of the numbers has digits. And even then, floating-point numbers are not precise; you may be shown 0.3 in one context and 0.30000000000000004 in another, for the same value.
It sounds like what you are actually after is the fractional part of the numbers. There are many ways to do this, but they all boil down the same idea: it is the result when you divide (as a floating-point number) the input by 1.
For a single value, it looks like:
fractional_part = value % 1.0
or
# This built-in function performs the division and gives you
# both quotient and remainder.
integer_part, fractional_part = divmod(value, 1.0)
or
import math
fractional_part = math.fmod(value, 1.0)
or
import math
# This function is provided as a special case.
# It also gives you the integer part.
# Notice that the results are the other way around vs. divmod!
fractional_part, integer_part = math.modf(value)
To process each value in a list in the same way, use a list comprehension.

regex: check if two phone numbers differ at most by 1 number [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a dataset of phone numbers that I want to check against each other. Basically the regex should throw a match if two phone numbers are at most 1 digit apart. For example, we have the following phone numbers:
+31612345678
+31612245678
These numbers are the same apart from position number 7 (first number has a 3 while the second number has a 2). As these phone number differ by 1 digit, the regex should throw a match. It stands to reason that the regex should also throw a match if the phone numbers are exactly the same. In the following case (see below), the regex should however not throw at match as the phone numbers differ by more than 1 digit:
+31612345678
+31611145678
Does anyone have a good regex in mind? I am writing the regex using the re module in python.

Depending on your use case - if you want to also catch "oh, you missed a digit" or "eh, that digit shouldn't have been there", use the edit distance between the two numbers instead.
You can use the levenshtein edit distance to get a number for how many "edits" would be required between two numbers, for example by using the editdistance library for python.
>>> import editdistance
>>> editdistance.eval('banana', 'bahama')
2L

This may not be the best code, but it would do the job.
from collections import Counter
a = '+31612345678'
b = '+31612245678'
def match(p1, p2):
ct = Counter([a == b for a, b in zip(p1, p2)])
if not ct[False] > 1:
<throw match>

You wouldn't use a regular expression for this. If your phone numbers have the same length something simple as
def is_match(phone_nr_1, phone_nr_2):
diff = filter(lambda x: x[0] != x[1],
zip(phone_nr_1, phone_nr_2))
return len(diff) <= 1
print is_match("+31612345678", "+31612245678")
#=> True
print is_match("+31612345678", "+31611145678")
#=> False
would do the job.

How to calculate sine and cosine without importing math? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I started programming in python not too long ago and I am having trouble with a part of a program. The program will ask for input from the user and he can input: A, B, C, M, or Q. I have completed the A, M, and Q part but I can't figure out how to do the parts for B (calculate the sine of the number you want) and C (calculate the sine).
All the information I was given was:
The power series approximation for the sine of X can be expressed as:
sine(X) = X – (X3/3!) + (X5/5!) – (X7/7!) + (X9/9!) .... Note that an
individual term in that power series can be expressed as: (-1)k *
X2k+1 / (2k+1)! where k = 0, 1, 2, 3, ….
Oooh, and (but for this a while loop should do right?):
When computing the sine of X or the cosine of X, the program will expand the power series
until the absolute value of the next term in the series is less than 1.0e-8 (the specified epsilon).
That term will not be included in the approximation.
And I can't use import math.
Can anyone give me an idea of how I can do this? I sincerely have no idea of where to even start hahaha.
Thanks in advance!
***Hey guys, I've been trying to do this for the last 3 hours. I'm really new to programming and some of yours answers made it a bit more understandable for me but my program is not working, I really don't know how to do this. And yes, I went to speak with a tutor today but he didn't know either. So yeah, I guess I'll just wait until I get the program graded by my teacher and then I can ask him how it was supposed to be done. Thank you for all the answers though, I appreciate them! :)

>>> e = 2.718281828459045
>>> X = 0.1
>>> (e**(X*1j)).imag # sin(X)
0.09983341664682815
>>> (e**(X*1j)).real # cos(X)
0.9950041652780258
Verify
>>> from math import sin, cos
>>> sin(X)
0.09983341664682815
>>> cos(X)
0.9950041652780258
You'll probably get better marks if you sum up the series explicitly though
result = 0
n = 1
while True:
term = ...
result += term
if term <= epsilon:
break
n += 2

It seems that you aren't supposed to import math because you are supposed to write your own function to compute sine. You are supposed to use the power series approximation.
I suggest you start by writing a factorial function, then write a loop that uses this factorial function to compute the power series.
If you still can't figure it out, I suggest you talk to your teacher or a teacher's assistant.

Since you have a condition to finish the loop last_term < 1.0e-8, you should use a while:
while last_term > 1.0e-8:
You will need a counter to keep the count of k (starting from 0) and a variable to keep the last term:
k = 10 # some initial value
last_term = 0
while ...:
last_term = ... # formula here
and also a result variable, let' say sin_x:
while ...:
...
sin_x += last_term
Note: In the formula you are using factorial, so will need to define a function that computes the factorial of a number, and use it properly.

Pi calculation in python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
n=iterations
for some reason this code will need a lot more iterations for more accurate result from other codes, Can anyone explain why this is happening? thanks.
n,s,x=1000,1,0
for i in range(0,n,2):
x+=s*(1/(1+i))*4
s=-s
print(x)

As I mentioned in a comment, the only way to speed this is to transform the sequence. Here's a very simple way, related to the Euler transformation (see roippi's link): for the sum of an alternating sequence, create a new sequence consisting of the average of each pair of successive partial sums. For example, given the alternating sequence
a0 -a1 +a2 -a3 +a4 ...
where all the as are positive, the sequences of partial sums is:
s0=a0 s1=a0-a1 s2=a0-a1+a2 s3=a0-a1+a2-a3 s4=a0-a1+a2-a3+a4 ...
and then the new derived sequence is:
(s0+s1)/2 (s1+s2)/2 (s2+s3)/2 (s3+s4)/2 ...
That can often converge faster - and the same idea can applied to this sequence. That is, create yet another new sequence averaging the terms of that sequence. This can be carried on indefinitely. Here I'll take it one more level:
from math import pi
def leibniz():
from itertools import count
s, x = 1.0, 0.0
for i in count(1, 2):
x += 4.0*s/i
s = -s
yield x
def avg(seq):
a = next(seq)
while True:
b = next(seq)
yield (a + b) / 2.0
a = b
base = leibniz()
d1 = avg(base)
d2 = avg(d1)
d3 = avg(d2)
for i in range(20):
x = next(d3)
print("{:.6f} {:8.4%}".format(x, (x - pi)/pi))
Output:
3.161905 0.6466%
3.136508 -0.1619%
3.143434 0.0586%
3.140770 -0.0262%
3.142014 0.0134%
3.141355 -0.0076%
3.141736 0.0046%
3.141501 -0.0029%
3.141654 0.0020%
3.141550 -0.0014%
3.141623 0.0010%
3.141570 -0.0007%
3.141610 0.0005%
3.141580 -0.0004%
3.141603 0.0003%
3.141585 -0.0003%
3.141599 0.0002%
3.141587 -0.0002%
3.141597 0.0001%
3.141589 -0.0001%
So after just 20 terms, we've already got pi to about 6 significant digits. The base Leibniz sequence is still at about 2 digits correct:
>>> next(base)
3.099944032373808
That's an enormous improvement. A key point here is that the partial sums of the base Leibniz sequence give approximations that alternate between "too big" and "too small". That's why averaging them gets closer to the truth. The same (alternating between "too big" and "too small") is also true of the derived sequences, so averaging their terms also helps.
That's all hand-wavy, of course. Rigorous justification probably isn't something you're interested in ;-)

That is because you are using the Leibniz series and it is known to converge very (very) slowly.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.