Changing or preventing the use of E in Python? - python

I'm currently trying to write a script that will create a unique ID for a user relying on a number of variables, like birthdate, name, hometown, etc. This creates a very long number that is completely unique to that user, however, to try to make the number even more unique, I want to change a random number in the string. This is what I have so far:
rand = randint(1,15)
tempid = id / 10**rand
if randint(1,2) == 1:
tempid = tempid + randint(2,10000)
else:
tempid = tempid - randint(5,7500)
print(id)
id = tempid * (10**rand)
print(str(id))
The code is fairly simple. It makes the number much smaller by dividing it by a large multiple of 10, adds or subtracts a random number, and multiplies it back to it's original length, with some changed numbers in the middle. The only problem is, because it must be an integer to be able to do any math with it, Python shortens it to 1.[something]e+[something]. This isn't helpful at all, becasue now it's not an ID. Is there anyway I can change it back to its original form, where it's just a long string, or perhaps change the code so it never becomes e? Thank you!

Unless this is a specific exercise, you do not want to generate unique IDs the way you do. It will fail. Use the uuid module instead.

Your problem is that id when you print it refers to a large float value, which is then printed in exponential notation. If it were an integer value, no e would be in the printout. The float value comes from your line
tempid = id / 10**rand
which, in Python 3.x, stores a float value in tempid. You later execute
id = tempid * (10**rand)
which multiplies a float by an integer, resulting in a float, and that float is what is printed in the next line
You can avoid this in several ways. You can keep all the calculations in integers by replacing your division line with
tempid = id // 10**rand
That extra slash mark means integer division, so tempid here and id later are integers. However, this may change the resulting values. So a better way is allow tempid to be a float but ensure that id is always an integer, using
id = int(tempid * (10**rand))
This should keep all your values the same and give you the print you want.
That answers your actual question. However, I agree with #user2722968 that if your purpose is to create a unique ID you should use module meant for that purpose, such as uuid. The history of computing shows that randomizing a part of a string to get a random value does poorly, and getting actual random or unique values is difficult to get right. You should do it the way others have shown to work well.

I also agree with the other answers; as far as best practice goes, you should not do it this way at all. You will almost certainly make a worse than optimal solution. However, to solve the actual problem you pose, I would want to approach it in a different manner.
The problem is, as stated, that your division will not leave you with an integer result, which makes Python automatically convert to a float. This is not what you want if you want to keep your value unique. You want to do all your calculations on integers only. To achieve that, the simplest way is to multiply your modifiers, instead of dividing your original number. That way you will never leave the integer domain, and there is no need to convert you value back to an integer:
print(id)
rand = randint(1,15)
multiplier = 10**rand
if randint(1,2) == 1:
id += multiplier * randint(2,10000)
else:
id -= multiplier * randint(5,7500)
print(id)
In addition I have used a bit of syntactic sugar, that I find rather nice, namely += and -=. They add and subtract a value from your variables respectively: a = a + 3 <=> a += 3.

Related

What is the best way to display numeric and symbolic expressions in python?

I need to produce calculation reports that detail step by step calculations, showing the formulas that are used and then showing how the results are achieved.
I have looked at using sympy to display symbolic equations. The problem is that a sympy symbol is stored as a variable, and therefore I cannot also store the numerical value of that symbol.
For example, for the formula σ=My/I , I need to show the value of each symbol, then the symbolic formula, then the formula with values substituted in, and finally the resolution of the formula.
M=100
y= 25
I=5
σ=My/I
σ=100*25/5
σ=5000
I’m new to programming and this is something I’m struggling with. I’ve thought of perhaps building my own class but not sure how to make the distinction the different forms. In the example above, σ is at one point a numerical value, one half of an symbolic expression, and also one half of a numerical expression.
Hopefully the following helps. This produces more or less what you want. You cannot get your fifth line of workings easily as you'll see in the code.
from sympy import *
# define all variables needed
# trying to keep things clear that symbols are different from their numeric values
M_label, y_label, l_label = ("M", "y", "l")
M_symbol, y_symbol, l_symbol = symbols(f"{M_label} {y_label} {l_label}", real=True)
M_value, y_value, l_value = (100, 25, 5)
# define the dictionary whose keys are string names
# and whose values are a tuple of symbols and numerical values
symbols_values = {M_label: (M_symbol, M_value),
y_label: (y_symbol, y_value),
l_label: (l_symbol, l_value)}
for name, symbol_value in symbols_values.items():
print(f"{name} = {symbol_value[1]}") # an f-string or formatted string
sigma = M_symbol * y_symbol / l_symbol
print(f"sigma = {sigma}")
# option 1
# changes `/5` to 5**(-1) since this is exactly how sympy views division
# credit for UnevaluatedExpr
# https://stackoverflow.com/questions/49842196/substitute-in-sympy-wihout-evaluating-or-simplifying-the-expression
sigma_substituted = sigma\
.subs(M_symbol, UnevaluatedExpr(M_value))\
.subs(y_symbol, UnevaluatedExpr(y_value))\
.subs(l_symbol, UnevaluatedExpr(l_value))
print(f"sigma = {sigma_substituted}")
# option 2
# using string substitution
# note this could replace words like `log`, `cos` or `exp` to something completely different
# this is why it is unadvised. The code above is far better for that purpose
sigma_substituted = str(sigma)\
.replace(M_label, str(M_value))\
.replace(y_label, str(y_value))\
.replace(l_label, str(l_value))
print(f"sigma = {sigma_substituted}")
sigma_simplified = sigma\
.subs(M_symbol, M_value)\
.subs(y_symbol, y_value)\
.subs(l_symbol, l_value)
print(f"sigma = {sigma_simplified}")
Also note that if you wanted to change the symbols_values dictionary to keys being the symbols and values being the numerical values, you will have a hard time or seemingly buggy experience using the keys. That is because if you have x1 = Symbol("x") and x2 = Symbol("x"), SymPy sometimes treats the above as 2 completely different variables even though they are defined the same way. It is far easier to use strings as keys.
If you begin to use more variables and choose to work this way, I suggest using lists and for loops instead of writing the same code over and over.

Shannon Diversity Program: basic questions

I am a biology student trying to get into programming and have some issue with a basic index calculator I am trying to write for a research project. I need a program that will prompt the user to input data points one at a time, perform the proper calculation (-1*(x*ln(x))) on each data point, and enter that new calculated value into an array. Once the user inputs 'done', I would like the program to sum the array values and return that index value.
This is what I have. I am very new so apologies for any blaring mistakes. Any points in the right direction are very appreciated.
import math
print('This program calculates a Shannon Diversity '
'Index value for a set of data points entered by the user.'
' when prompted enter a species number value,then press enter. '
'COntinue until all data points have been entered. '
'Upon completion, enter the word done.')
def Shannonindex():
index = []
entries = 1,000,000
endKey = 'done'
for i in range(entries):
index = [input("Enter a value: ")]
if index != endKey:
entry = p
p = -1*(x*ln(x))
index.append(p)
else Sindex = sum(index)
return Sindex
print('Your Shannon Diversity Value is: ", Sindex)
There are a huge number of problms here.
You need to get your variables straight.
You're trying to use index to mean both the list of values, and the input string. It can't mean both things at once.
You're trying to use x without defining it anywhere. Presumably it's supposed to be the float value of the input string? If so, you have to say that.
You're trying to use p to define entry before p even exists. But it's not clear what entry is even useful for, since you never use it.
You also need to get your control flow straight.
What code is supposed to run in that else case? Either it has to include the return, or you need some other way to break out of the loop.
You also need to get your types straight. [input(…)] is going to give you a list with one element, the input string. It's hard to imagine what that would be useful for. You can't compare that list to 'done', or convert it to a float. What you want is just the input string itself.
You can't just guess at what functions might exist. There's no function named ln. Look at the docs for Built-in Functions, the math module, and anything else that looks like it might be relevant to find the function you need.
1,000,000 is not a number, but a tuple of three numbers.
You can write 1_000_000, or just 1000000.
But it's not clear why you need a limit in the first place. Why not just loop forever until the enter done?
You've defined a function, but you never call it, so it doesn't do any good.
So, let's sort out these problems:
import math
def Shannonindex():
index = []
endKey = 'done'
while True:
value = input("Enter a value: ")
if value != endKey:
x = float(value)
p = -1 * (x * math.log(x))
index.append(p)
else:
Sindex = sum(index)
return Sindex
Sindex = Shannonindex()
print('Your Shannon Diversity Value is: ", Sindex)
There are still many ways you could improve this:
Add some error handling, so if the user typos 13.2.4 or 1O, it tells them to try again instead of bailing out on the whole thing.
You don't actually need to build a list, just keep a running total.
If you reverse the sense of the if/else it will probably be more readable.
You're not actually calculating the Shannon diversity index. That's not the sum of -x ln x, it's the sum of -p ln p where each p is the proportion of x / sum(all x). To handle that, you need to keep all the raw x values in a list, so you can convert that to a list of p values, so you can sum those.
import math
index = []
for i in range(1,100000):
val = input("Enter a value: ")
if val =='done':
break
else:
x = int(val)
p = -1*(x*math.log(x))
index.append(p)
print ("The value of index is %s:"%index)
=================================================
This is the simplified form of your code, since you are new to python.
This might help you get the values stored in a list and calculate it until you type done.

how can I verify that this hash function is not gonna give me same result for two diiferent strings?

Consider two different strings to be of same length.
I am implementing robin-karp algorithm and using the hash function below:
def hs(pat):
l = len(pat)
pathash = 0
for x in range(l):
pathash += ord(pat[x])*prime**x # prime is global variable equal to 101
return pathash
It's a hash. There's, by definition, no guarantee there will be no collisions - otherwise, the hash would have to be as long as the hashed value, at least.
The idea behind what you're doing is based in number theory: powers of a number that is coprime to the size of your finite group (which probably the original author meant to be something like 2^N) can give you any number in that finite group, and it's hard to tell which one these were.
Sadly, the interesting part of this hash function, namely the size limiting/modulo operation of the hash, has been left out of this code – which makes one wonder where your code comes from. As far as I can immediately see, has little to do with Rabin-Karb.

Sometimes my set comes out ordered and sometimes not (Python)

So I know that a set is supposed to be an unordered list. I am trying to do some coding of my own and ended up with a weird happening. My set will sometimes go in order from 1 - 100 (when using a larger number) and when I use a smaller number it will stay unordered. Why is that?
#Steps:
#1) Take a number value for total random numbers in 1-100
#2) Put those numbers into a set (which will remove duplicates)
#3) Print that set and the total number of random numbers
import random
randomnums = 0
Min = int(1)
Max = int(100)
print('How many random numbers would you like?')
numsneeded = int(input('Please enter a number. '))
print("\n" * 25)
s = set()
while (randomnums < numsneeded):
number = random.randint(Min, Max)
s.add(number)
randomnums = randomnums + 1
print s
print len(s)
If anyone has any pointers on cleaning up my code I am 100% willing to learn. Thank you for your time!
When the documentation for set says it is an unordered collection, it only means that you can assume no specific order on the elements of the set. The set can choose what internal representation it uses to hold the data, and when you ask for the elements, they might come back in any order at all. The fact that they are sorted in some cases might mean that the set has chosen to store your elements in a sorted manner.
The set can make tradeoff decisions between performance and space depending on factors such as the number of elements in the set. For example, it could store small sets in a list, but larger sets in a tree. The most natural way to retrieve elements from a tree is in sorted order, so that's what could be happening for you.
See also Can Python's set absence of ordering be considered random order? for further info about this.
Sets are implemented with a hash implementation. The hash of an integer is just the integer. To determine where to put the number in the table the remainder of the integer when divided by the table size is used. The table starts with a size of 8, so the numbers 0 to 7 would be placed in their own slot in order, but 8 would be placed in the 0 slot. If you add the numbers 1 to 4 and 8 into an empty set it will display as:
set([8,1,2,3,4])
What happens when 5 is added is that the table has exceeded 2/3rds full. At that point the table is increased in size to 32. When creating the new table the existing table is repopulated into the new table. Now it displays as:
set([1,2,3,4,5,8])
In your example as long as you've added enough entries to cause the table to have 128 entries, then they will all be placed in the table in their own bins in order. If you've only added enough entries that the table has 32 slots, but you are using numbers up to 100 the items won't necessarily be in order.

Python noob: manipulating arrays

I have already asked a few questions on here about this same topic, but I'm really trying not to disappoint the professor I'm doing research with. This is my first time using Python and I may have gotten in a little over my head.
Anyways, I was sent a file to read and was able to using this command:
SNdata = numpy.genfromtxt('...', dtype=None,
usecols (0,6,7,8,9,19,24,29,31,33,34,37,39,40,41,42,43,44),
names ['sn','off1','dir1','off2','dir2','type','gal','dist',
'htype','d1','d2','pa','ai','b','berr','b0','k','kerr'])
sn is just an array of the names of a particular supernova; type is an array of the type of supernovae it is (Ia or II), etc.
One of the first things I need to do is simply calculate the probabilities of certain properties given the SN type (Ia or II).
For instance, the column htype is the morphology of a galaxy (given as an integer 1=elliptical to 8=irregular). I need to calculate the probability of an elliptical given a TypeIa and an elliptical given TypeII, for all of the integers to up to 8.
For ellipticals, I know that I just need the number of elements that have htype = 1 and type = Ia divided by the total number of elements of type = Ia. And then the number of elements that have htype = 1 and type = II divided by the total number of elements that have type = II.
I just have no idea how to write code for this. I was planning on finding the total number of each type first and then running a for loop to find the number of elements that have a certain htype given their type (Ia or II).
Could anyone help me get started with this? If any clarification is needed, let me know.
Thanks a lot.
Numpy supports boolean array operations, which will make your code fairly straightforward to write. For instance, you could do:
htype_sums = {}
for htype_number in xrange(1,9):
htype_mask = SNdata.htype == htype_number
Ia_mask = SNdata.type == 'Ia'
II_mask = SNdata.type == 'II'
Ia_sum = (htype_mask & Ia_mask).sum() / Ia_mask.sum()
II_sum = (htype_mask & II_mask).sum() / II_mask.sum()
htype_sums[htype_number] = (Ia_sum, II_sum)
Each of the _mask variables are boolean arrays, so when you sum them you count the number of elements that are True.
You can use collections.Counter to count needed observations.
For example,
from collections import Counter
types_counter = Counter(row['type'] for row in data)
will give you desired counts of sn types.
htypes_types_counter = Counter((row['type'], row['htype']) for row in data)
counts for morphology and types. Then, to get your evaluation for ellipticals, just divide
1.0*htypes_types_counter['Ia', 1]/types_counter['Ia']

Categories

Resources