I am am trying to round numbers in a dataframe that has lists as values for each row. I need whole numbers to have no decimal and floats to only have two places after the decimal. There is an unknown number of values for each list (some lists have 2 values, some have 4 or 5 or more). Here is what I have:
df = pd.DataFrame({"A": [[16.0, 24.4175], [14.9687, 16.06], [22.75, 23.00]]})
def remove_exponent(num):
return num.to_integral() if num == num.to_integral() else num.normalize()
def round_string_float(x):
try:
return remove_exponent(Decimal(x).quantize(TWOPLACES))
except:
return x
df['A']=df['A'].apply(lambda x: [round_string_float(num) for num in x])
But this gives me: [Decimal('16'), Decimal('24.42')]
Here is what I am trying:
def round(num):
if str(numbers).find('/') > -1:
nom, den = numbers.split(',')
number=round_string_float(nom)
second=round_string_float(den)
return f'[{number}, {second}]'
but there has to be an easier way to do this
Here is what I want:
df = pd.DataFrame({"A": [[16, 24.42], [14.97, 16.06], [22.75, 23]]})
I would like to know have to use **args to do this but really anything that works would be good
Have you tried a for loop. For example
list = []
for i in range(len(df)):
for j in range(len(df[i])):
list .append(round(df[i][j]))
That's a weird format for a DataFrame, but if you want it you can do something like this:
import pandas as pd
df = pd.DataFrame({"A": [[16.0, 24.4175], [14.9687, 16.06], [22.75, 23.00]]})
print(df.applymap(lambda x: [round(v, None if v.is_integer() else 2) for v in x]))
Given that
The return value [of round] is an integer if ndigits is omitted or None.
this evaluates, for each nested number v, round(v) if v is an integer else round(v, 2).
This outputs
A
0 [16, 24.42]
1 [14.97, 16.06]
2 [22.75, 23]
I created an answer to this question that goes above and beyond what I wanted but I think it will help anyone looking for something similar. The problem with my company is we have to upload lists as values in a dataframe to the database. This is why the code is so ad-hoc:
from decimal import *
TWOPLACES = Decimal(10) ** -2
from natsort import natsorted
import ast
from fractions import Fraction
#----------------------------------------------------------------
# remove_exponent and round string float are designed to round whole numbers 16.00 to 16, and rounds numbers with 3 or more decimals to 2 decimals 16.254 to 16.25
def remove_exponent(num):
return num.to_integral() if num == num.to_integral() else num.normalize()
def round_string_float(x):
try:
return remove_exponent(Decimal(x).quantize(TWOPLACES))
except:
return x
#------------------------------------------------------------------------------
# frac2string converts fractions to decimals: 1 1/2 to 1.5
def frac2string(s):
i, f = s.groups(0)
f = round_string_float(Fraction(f))
return str(int(i) + round_string_float(float(f)))
#------------------------------------------
#remove duplicates is self explanitory
def remove_duplicates(A):
[A.pop(count) for count,elem in enumerate(A) if A.count(elem)!=1]
return A
# converts fractions and rounds numbers
df['matches'] = df['matches'].apply(lambda x:[re.sub(r'(?:(\d+)[-\s])?(\d+/\d+)', frac2string, x)])
# removes duplicates( this needs to be in the format ["\d","\d"]
df['matches'] = df['matches'].apply(lambda x: remove_duplicates([n.strip() for n in ast.literal_eval(x)]))
Related
I have a dataframe containing 15K+ strings in the format of xxxx-yyyyy-zzz. The yyyyy is a random 5 digit number generated. Given that I have xxxx as 1000 and zzz as 200, how can I generate the random yyyyy and add it to the dataframe so that the string is unique?
number
0 1000-12345-100
1 1000-82045-200
2 1000-93035-200
import pandas as pd
data = {"number": ["1000-12345-100", "1000-82045-200", "1000-93035-200"]}
df = pd.DataFrame(data)
print(df)
I'd generate a new column with just the middle values and generate random numbers until you find one that's not in the column.
from random import randint
df["excl"] = df.number.apply(lambda x:int(x.split("-")[1]))
num = randint(10000, 99999)
while num in df.excl.values:
num = randint(10000, 99999)
I tried to come up with a generic approach, you can use this for lists:
import random
number_series = ["1000-12345-100", "1000-82045-200", "1000-93035-200"]
def rnd_nums(n_numbers: int, number_series: list, max_length: int=5, prefix: int=1000, suffix: int=100):
# ignore following numbers
blacklist = [int(x.split('-')[1]) for x in number_series]
# define space with allowed numbers
rng = range(0, 10**max_length)
# get unique sample of length "n_numbers"
lst = random.sample([i for i in rng if i not in blacklist], n_numbers)
# return sample as string with pre- and suffix
return ['{}-{:05d}-{}'.format(prefix, mid, suffix) for mid in lst]
rnd_nums(5, number_series)
Out[69]:
['1000-79396-100',
'1000-30032-100',
'1000-09188-100',
'1000-18726-100',
'1000-12139-100']
Or use it to generate new rows in a dataframe Dataframe:
import pandas as pd
data = {"number": ["1000-12345-100", "1000-82045-200", "1000-93035-200"]}
df = pd.DataFrame(data)
print(df)
df.append(pd.DataFrame({'number': rnd_nums(5, number_series)}), ignore_index=True)
Out[72]:
number
0 1000-12345-100
1 1000-82045-200
2 1000-93035-200
3 1000-00439-100
4 1000-36284-100
5 1000-64592-100
6 1000-50471-100
7 1000-02005-100
In addition to the other suggestions, you could also write a function that takes your df and the amount of new numbers you would like to add as arguments, appends it with the new numbers and returns the updated df. The function could look like this:
import pandas as pd
import random
def add_number(df, num):
lst = []
for n in df["number"]:
n = n.split("-")[1]
lst.append(int(n))
for i in range(num):
check = False
while check == False:
new_number = random.randint(10000, 99999)
if new_number not in lst:
lst.append(new_number)
l = len(df["number"])
df.at[l+1,"number"] = "1000-%i-200" % new_number
check = True
df = df.reset_index(drop=True)
return df
This would have the advantage that you could use the function every time you want to add new numbers.
try:
import random
df['number'] = [f"1000-{x}-200" for x in random.sample(range(10000, 99999), len(df))]
output:
number
0 1000-24744-200
1 1000-28991-200
2 1000-98322-200
...
One option is to use sample from the random module:
import random
num_digits = 5
col_length = 15000
rand_nums = random.sample(range(10**num_digits),col_length)
data["number"]=['-'.join(
'1000',str(num).zfill(num_digits),'200')
for num in rand_nums]
It took my computer about 30 ms to generate the numbers. For numbers with more digits, it may become infeasible.
Another option is to just take sequential integers, then encrypt them. This will result in a sequence in which each element is unique. They will be pseudo-random, rather than truly random, but then Python's random module is producing pseudo-random numbers as well.
I have an integer integer = 10101001. I wanted to split that number into an array of 2 four bit numbers array = [1010,1001]. How do I do this? Are there any python methods?
This is a way to do it:
num = 10101001
str_num = str(num)
split_num = [int(str_num[0:4]), int(str_num[4:])]
print(split_num)
Output:
[1010, 1001]
You need to pass by the string version of you int
i = 10101001
str_i = str(i)
res = str_i[:len(str_i) // 2], str_i[len(str_i) // 2:]
print(res) # ('1010', '1001')
If that is indeed the general case, you can use a simple method.
def func(x):
x = str(x) # this takes x and turns it into a string
sub1 = int(x[:4]) # this takes the first 4 digits and turns it into an integer
sub2 = int(x[4:]) # this takes the last 4 digits and turns it into an integer
return [sub1, sub2]
Note that I used the fact that strins are subscriptable. You can fetch characters in a string just like a list.
This function is supposed to take a string of numbers(snum) and then the index it is supposed to start at (indx) and then starting at that (indx) and multiply the next (dig) amount of numbers and return the value. This is current funciton should return 72 but it is returning 41472. Thank you!
def product(dig, indx, snum):
length = int(len(snum))
int(indx)
int(dig)
total = int(snum[indx])
for k in range((indx + 1), length):
for i in range(0, dig):
total = total * int(snum[k])
else:
return total
x = product(3, 5, '72890346')
print(x)
Following should do it :
def product(dig, indx, snum):
mul = 1
for s in snum[indx : indx+dig+1]: #Note the `dig+1`
mul *= int(s) #multiply the number
return mul
Driver code :
x = product(3, 5, '72890346')
print(x)
#72
In your code, the logic has few problems. You do not need two loops. Here, we are using slicing operation to get characters between indx and indx+dig, and then converting the string we got to int and multiplying.
In need to compare numbers which look like: 12,3K , 1,84M, etc
eg:
a = 12,3K
b = 1,84M
if b > a :
print b
You need to use replace for it:
a = ("12,3K", "1,84M")
numbers = {"K": 1000, "M": 1000000}
result = []
for value in a:
if value:
i = value[-1]
value = float(value[:-1].replace(',', '.')) * numbers[i]
result.append(int(value))
print max(result)
You can add more numbers to dictionary and you will get more results.
I would recommend a function to convert a and b into the corresponding number like so (also I'd make a and b strings:
def convert(num):
return num.replace(',','').replace('K','000').replace('M','000000')
a = '12,3K'
b = '1,84M'
if convert(b) > convert(a) :
print b
If your values are strings, then the re module would make it easy to replace commas with '' and K or M with 3 or 6 zeroes. Then wrap in int() and compare. Where / how are you getting the values you're comparing?
This question already has answers here:
How to split an integer into a list of digits?
(10 answers)
Closed 4 months ago.
What is the quickest and cleanest way to convert an integer into a list?
For example, change 132 into [1,3,2] and 23 into [2,3]. I have a variable which is an int, and I want to be able to compare the individual digits so I thought making it into a list would be best, since I can just do int(number[0]), int(number[1]) to easily convert the list element back into int for digit operations.
Convert the integer to string first, and then use map to apply int on it:
>>> num = 132
>>> map(int, str(num)) #note, This will return a map object in python 3.
[1, 3, 2]
or using a list comprehension:
>>> [int(x) for x in str(num)]
[1, 3, 2]
There are already great methods already mentioned on this page, however it does seem a little obscure as to which to use. So I have added some mesurements so you can more easily decide for yourself:
A large number has been used (for overhead) 1111111111111122222222222222222333333333333333333333
Using map(int, str(num)):
import timeit
def method():
num = 1111111111111122222222222222222333333333333333333333
return map(int, str(num))
print(timeit.timeit("method()", setup="from __main__ import method", number=10000)
Output: 0.018631496999999997
Using list comprehension:
import timeit
def method():
num = 1111111111111122222222222222222333333333333333333333
return [int(x) for x in str(num)]
print(timeit.timeit("method()", setup="from __main__ import method", number=10000))
Output: 0.28403817900000006
Code taken from this answer
The results show that the first method involving inbuilt methods is much faster than list comprehension.
The "mathematical way":
import timeit
def method():
q = 1111111111111122222222222222222333333333333333333333
ret = []
while q != 0:
q, r = divmod(q, 10) # Divide by 10, see the remainder
ret.insert(0, r) # The remainder is the first to the right digit
return ret
print(timeit.timeit("method()", setup="from __main__ import method", number=10000))
Output: 0.38133582499999996
Code taken from this answer
The list(str(123)) method (does not provide the right output):
import timeit
def method():
return list(str(1111111111111122222222222222222333333333333333333333))
print(timeit.timeit("method()", setup="from __main__ import method", number=10000))
Output: 0.028560138000000013
Code taken from this answer
The answer by Duberly González Molinari:
import timeit
def method():
n = 1111111111111122222222222222222333333333333333333333
l = []
while n != 0:
l = [n % 10] + l
n = n // 10
return l
print(timeit.timeit("method()", setup="from __main__ import method", number=10000))
Output: 0.37039988200000007
Code taken from this answer
Remarks:
In all cases the map(int, str(num)) is the fastest method (and is therefore probably the best method to use). List comprehension is the second fastest (but the method using map(int, str(num)) is probably the most desirable of the two.
Those that reinvent the wheel are interesting but are probably not so desirable in real use.
The shortest and best way is already answered, but the first thing I thought of was the mathematical way, so here it is:
def intlist(n):
q = n
ret = []
while q != 0:
q, r = divmod(q, 10) # Divide by 10, see the remainder
ret.insert(0, r) # The remainder is the first to the right digit
return ret
print intlist(3)
print '-'
print intlist(10)
print '--'
print intlist(137)
It's just another interesting approach, you definitely don't have to use such a thing in practical use cases.
n = int(raw_input("n= "))
def int_to_list(n):
l = []
while n != 0:
l = [n % 10] + l
n = n // 10
return l
print int_to_list(n)
If you have a string like this: '123456'
and you want a list of integers like this: [1,2,3,4,5,6], use this:
>>>s = '123456'
>>>list1 = [int(i) for i in list(s)]
>>>print(list1)
[1,2,3,4,5,6]
or if you want a list of strings like this: ['1','2','3','4','5','6'], use this:
>>>s = '123456'
>>>list1 = list(s)
>>>print(list1)
['1','2','3','4','5','6']
Use list on a number converted to string:
In [1]: [int(x) for x in list(str(123))]
Out[2]: [1, 2, 3]
>>>list(map(int, str(number))) #number is a given integer
It returns a list of all digits of number.
you can use:
First convert the value in a string to iterate it, Them each value can be convert to a Integer value = 12345
l = [ int(item) for item in str(value) ]
By looping it can be done the following way :)
num1= int(input('Enter the number'))
sum1 = num1 #making a alt int to store the value of the orginal so it wont be affected
y = [] #making a list
while True:
if(sum1==0):#checking if the number is not zero so it can break if it is
break
d = sum1%10 #last number of your integer is saved in d
sum1 = int(sum1/10) #integer is now with out the last number ie.4320/10 become 432
y.append(d) # appending the last number in the first place
y.reverse()#as last is in first , reversing the number to orginal form
print(y)
Answer becomes
Enter the number2342
[2, 3, 4, 2]
num = 123
print(num)
num = list(str(num))
num = [int(i) for i in num]
print(num)
num = list(str(100))
index = len(num)
while index > 0:
index -= 1
num[index] = int(num[index])
print(num)
It prints [1, 0, 0] object.
Takes an integer as input and converts it into list of digits.
code:
num = int(input())
print(list(str(num)))
output using 156789:
>>> ['1', '5', '6', '7', '8', '9']