Grouping list of integers in a range into chunks - python

Given a set or a list (assume its ordered)
myset = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
I want to find out how many numbers appear in a range.
say my range is 10. Then given the list above, I have two sets of 10.
I want the function to return [10,10]
if my range was 15. Then I should get [15,5]
The range will change. Here is what I came up with
myRange = 10
start = 1
current = start
next = current + myRange
count = 0
setTotal = []
for i in myset:
if i >= current and i < next :
count = count + 1
print str(i)+" in "+str(len(setTotal)+1)
else:
current = current + myRange
next = myRange + current
if next >= myset[-1]:
next = myset[-1]
setTotal.append(count)
count = 0
print setTotal
Output
1 in 1
2 in 1
3 in 1
4 in 1
5 in 1
6 in 1
7 in 1
8 in 1
9 in 1
10 in 1
12 in 2
13 in 2
14 in 2
15 in 2
16 in 2
17 in 2
18 in 2
19 in 2
[10, 8]
notice 11 and 20 where skipped. I also played around with the condition and got wired results.
EDIT: Range defines a range that every value in the range should be counted into one chuck.
think of a range as from current value to currentvalue+range as one chunk.
EDIT:
Wanted output:
1 in 1
2 in 1
3 in 1
4 in 1
5 in 1
6 in 1
7 in 1
8 in 1
9 in 1
10 in 1
11 in 2
12 in 2
13 in 2
14 in 2
15 in 2
16 in 2
17 in 2
18 in 2
19 in 2
[10, 10]

With the right key function, thegroupbymethod in the itertoolsmodule makes doing this fairly simple:
from itertools import groupby
def ranger(values, range_size):
def keyfunc(n):
key = n/(range_size+1) + 1
print '{} in {}'.format(n, key)
return key
return [len(list(g)) for k, g in groupby(values, key=keyfunc)]
myset = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
print ranger(myset, 10)
print ranger(myset, 15)

You want to use simple division and the remainder; the divmod() function gives you both:
def chunks(lst, size):
count, remainder = divmod(len(lst), size)
return [size] * count + ([remainder] if remainder else [])
To create your desired output, then use the output of chunks():
lst = range(1, 21)
size = 10
start = 0
for count, chunk in enumerate(chunks(lst, size), 1):
for i in lst[start:start + chunk]:
print '{} in {}'.format(i, count)
start += chunk
count is the number of the current chunk (starting at 1; python uses 0-based indexing normally).
This prints:
1 in 1
2 in 1
3 in 1
4 in 1
5 in 1
6 in 1
7 in 1
8 in 1
9 in 1
10 in 1
11 in 2
12 in 2
13 in 2
14 in 2
15 in 2
16 in 2
17 in 2
18 in 2
19 in 2
20 in 2

If you don't care about what numbers are in a given chunk, you can calculate the size easily:
def chunk_sizes(lst, size):
complete = len(lst) // size # Number of `size`-sized chunks
partial = len(lst) % size # Last chunk
if partial: # Sometimes the last chunk is empty
return [size] * complete + [partial]
else:
return [size] * complete

Related

sum of Fibbonaci Sequences?

Trying to add the sum of Fibonacci, using definite loops. It's meant to calculate the summation of Fibonacci number with each number too. Below is the sample for the Fibonacci sequence and its summation, how do i add the sum of the fibonacci eg 1,1,2,3,5,8
Fibonacci Summation
0 0
1 1
1 2
2 4
3 7
5 12
8 20
n = int(input("enter"))
def fibonacciSeries():
a=0
b=1
for i in range (n-2):
x = a+b
a=b
b=x
int(x)
x[i]= x+x[i-1]
#should add the previous sequences
print(x)
fibonacciSeries()
You don't need to keep track of the whole sequence. Plus your Fibonacci implementation doesn't start with 1, 1 but rather 1, 2 so I fixed that.
def fibonacciSeries(n):
a=0
b=1
x=1
series_sum = 0
for i in range (n-2):
series_sum += x
print(f'{x} {series_sum}')
x = a+b
a=b
b=x
n = 10
fibonacciSeries(n)
Output:
1 1
1 2
2 4
3 7
5 12
8 20
13 33
21 54
def fibonacciSeries(n):
sum = 0
a = 0
b = 1
x = 1
sum = 0
for i in range(0,n - 2):
sum += x
print(x,sum)
x = a + b
a = b
b = x
n = int(input("enter : ")) # n = 8
fibonacciSeries(n)
Output:
enter : 8
1 1
1 2
2 4
3 7
5 12
8 20

Is there a way to reference a previous value in Pandas column efficiently?

I want to do some complex calculations in pandas while referencing previous values (basically I'm calculating row by row). However the loops take forever and I wanted to know if there was a faster way. Everybody keeps mentioning using shift but I don't understand how that would even work.
df = pd.DataFrame(index=range(500)
df["A"]= 2
df["B"]= 5
df["A"][0]= 1
for i in range(len(df):
if i != 0: df['A'][i] = (df['A'][i-1] / 3) - df['B'][i-1] + 25
numpy_ext can be used for expanding calculations
pandas-rolling-apply-using-multiple-columns for reference
I have also included a simpler calc to demonstrate behaviour in simpler way
df = pd.DataFrame(index=range(5000))
df["A"]= 2
df["B"]= 5
df["A"][0]= 1
import numpy_ext as npe
# for i in range(len(df):
# if i != 0: df['A'][i] = (df['A'][i-1] / 3) - df['B'][i-1] + 25
# SO example - function of previous values in A and B
def f(A,B):
r = np.sum(A[:-1]/3) - np.sum(B[:-1] + 25) if len(A)>1 else A[0]
return r
# much simpler example, sum of previous values
def g(A):
return np.sum(A[:-1])
df["AB_combo"] = npe.expanding_apply(f, 1, df["A"].values, df["B"].values)
df["A_running"] = npe.expanding_apply(g, 1, df["A"].values)
print(df.head(10).to_markdown())
sample output
A
B
AB_combo
A_running
0
1
5
1
0
1
2
5
-29.6667
1
2
2
5
-59
3
3
2
5
-88.3333
5
4
2
5
-117.667
7
5
2
5
-147
9
6
2
5
-176.333
11
7
2
5
-205.667
13
8
2
5
-235
15
9
2
5
-264.333
17

Index and save last N points from a list that meets conditions from dataframe Python

I have a DataFrame that contains gas concentrations and the corresponding valve number. This data was taken continuously where we switched the valves back and forth (valves=1 or 2) for a certain amount of time to get 10 cycles for each valve value (20 cycles total). A snippet of the data looks like this (I have 2,000+ points and each valve stayed on for about 90 seconds each cycle):
gas1 valveW time
246.9438 2 1
247.5367 2 2
246.7167 2 3
246.6770 2 4
245.9197 1 5
245.9518 1 6
246.9207 1 7
246.1517 1 8
246.9015 1 9
246.3712 2 10
247.0826 2 11
... ... ...
My goal is to save the last N points of each valve's cycle. For example, the first cycle where valve=1, I want to index and save the last N points from the end before the valve switches to 2. I would then save the last N points and average them to find one value to represent that first cycle. Then I want to repeat this step for the second cycle when valve=1 again.
I am currently converting from Matlab to Python so here is the Matlab code that I am trying to translate:
% NOAA high
n2o_noaaHigh = [];
co2_noaaHigh = [];
co_noaaHigh = [];
h2o_noaaHigh = [];
ind_noaaHigh_end = zeros(1,length(t_c));
numPoints = 40;
for i = 1:length(valveW_c)-1
if (valveW_c(i) == 1 && valveW_c(i+1) ~= 1)
test = (i-numPoints):i;
ind_noaaHigh_end(test) = 1;
n2o_noaaHigh = [n2o_noaaHigh mean(n2o_c(test))];
co2_noaaHigh = [co2_noaaHigh mean(co2_c(test))];
co_noaaHigh = [co_noaaHigh mean(co_c(test))];
h2o_noaaHigh = [h2o_noaaHigh mean(h2o_c(test))];
end
end
ind_noaaHigh_end = logical(ind_noaaHigh_end);
This is what I have so far for Python:
# NOAA high
n2o_noaaHigh = [];
co2_noaaHigh = [];
co_noaaHigh = [];
h2o_noaaHigh = [];
t_c_High = []; # time
for i in range(len(valveW_c)):
# NOAA HIGH
if (valveW_c[i] == 1):
t_c_High.append(t_c[i])
n2o_noaaHigh.append(n2o_c[i])
co2_noaaHigh.append(co2_c[i])
co_noaaHigh.append(co_c[i])
h2o_noaaHigh.append(h2o_c[i])
Thanks in advance!
I'm not sure if I understood correctly, but I guess this is what you are looking for:
# First we create a column to show cycles:
df['cycle'] = (df.valveW.diff() != 0).cumsum()
print(df)
gas1 valveW time cycle
0 246.9438 2 1 1
1 247.5367 2 2 1
2 246.7167 2 3 1
3 246.677 2 4 1
4 245.9197 1 5 2
5 245.9518 1 6 2
6 246.9207 1 7 2
7 246.1517 1 8 2
8 246.9015 1 9 2
9 246.3712 2 10 3
10 247.0826 2 11 3
Now you can use groupby method to get the average for the last n points of each cycle:
n = 3 #we assume this is n
df.groupby('cycle').apply(lambda x: x.iloc[-n:, 0].mean())
Output:
cycle 0
1 246.9768
2 246.6579
3 246.7269
Let's call your DataFrame df; then you could do:
results = {}
for k, v in df.groupby((df['valveW'].shift() != df['valveW']).cumsum()):
results[k] = v
print(f'[group {k}]')
print(v)
Shift(), as it suggests, shifts the column of the valve cycle allows to detect changes in number sequences. Then, cumsum() helps to give a unique number to each of the group with the same number sequence. Then we can do a groupby() on this column (which was not possible before because groups were either of ones or twos!).
which gives e.g. for your code snippet (saved in results):
[group 1]
gas1 valveW time
0 246.9438 2 1
1 247.5367 2 2
2 246.7167 2 3
3 246.6770 2 4
[group 2]
gas1 valveW time
4 245.9197 1 5
5 245.9518 1 6
6 246.9207 1 7
7 246.1517 1 8
8 246.9015 1 9
[group 3]
gas1 valveW time
9 246.3712 2 10
10 247.0826 2 11
Then to get the mean for each cycle; you could e.g. do:
df.groupby((df['valveW'].shift() != df['valveW']).cumsum()).mean()
which gives (again for your code snippet):
gas1 valveW time
valveW
1 246.96855 2.0 2.5
2 246.36908 1.0 7.0
3 246.72690 2.0 10.5
where you wouldn't care much about the time mean but the gas1 one!
Then, based on results you could e.g. do:
n = 3
mean_n_last = []
for k, v in results.items():
if len(v) < n:
mean_n_last.append(np.nan)
else:
mean_n_last.append(np.nanmean(v.iloc[len(v) - n:, 0]))
which gives [246.9768, 246.65796666666665, nan] for n = 3 !
If your dataframe is sorted by time you could get the last N records for each valve like this.
N=2
valve1 = df[df['valveW']==1].iloc[-N:,:]
valve2 = df[df['valveW']==2].iloc[-N:,:]
If it isn't currently sorted you could easily sort it like this.
df.sort_values(by=['time'])

Print a number table in a simple format

I am stuck trying to print out a table in Python which would look like this (first number stands for amount of numbers, second for amount of columns):
>>> print_table(13,4)
0 1 2 3
4 5 6 7
8 9 10 11
12 13
Does anyone know a way to achieve this?
This is slightly more difficult than it sounds initially.
def numbers(n, r):
print('\n'.join(' '.join(map(str, range(r*i, min(r*(i + 1), n + 1)))) for i in range(n//r + 1)))
numbers(13, 4)
#>>> 0 1 2 3
4 5 6 7
8 9 10 11
12 13
def numbers(a,b):
i=0;
c=0;
while i<=a:
print(i,end="") #prevents printing a new line
c+=1
if c>=b:
print("\n") #prints a new line when the number of columns is reached and then reset the current column number
c=0;
I think it should work
def num2(n=10, r=3):
print('\n'.join(' '.join(tuple(map(str, range(n+1)))[i:i+r]) for i in range(0, n+1, r)))
<<<
0 1 2
3 4 5
6 7 8
9 10

Number Pyramid Nested for Loop

I'm wondering if you could help me out. I'm trying to write a nested for loop in Python 3 that displays a number pyramid that looks like;
1
1 2 1
1 2 4 2 1
1 2 4 8 4 2 1
1 2 4 8 16 8 4 2 1
1 2 4 8 16 32 16 8 4 2 1
1 2 4 8 16 32 64 32 16 8 4 2 1
1 2 4 8 16 32 64 128 64 32 16 8 4 2 1
Can anybody help me out? It would be much appreciated!
This is what I have so far:
col = 1
for i in range(-1, 18, col*2):
for j in range(1, 0, 1):
print(" ", end = "")
for j in range(i, 0, -2):
print(j, end = " ")
print()
So, I can only get half of the pyramid to display.
I guess the main problems I'm having is:
How do i get the output to display an increasing and then decreasing value (ie. 1, 2, 4, 2, 1)?
An alternate way using list comprehensions.
Always break the problem down into digestable chunks. Each line is a mirror of itself, so lets just deal with first making out set of numbers we need.
This generates a list of strings that hold all powers of two which is what this is generating
lines = []
for i in range(1,9):
lines.append([str(2**j) for j in range(i)])
But if we just print this list, a) its going to only have half, and b) its going to mush the numbers together. We need to buffer the numbers with spaces. Fortunately, the last row will have the largest digits for any column, so:
Firstly, how long does each line need to end up being (we need this later) and also, what is the longest number in each column. We can use len as we cast the numbers to strings above.
b = len(lines[-1])
buffers = [len(x) for x in lines[-1]]
Now I have everything I need to print the strings (we stopped using numbers above):
So, for each line, find out how long it is, and expand the array it to the length of the longest line by filling the left of the array with empty strings (for this we're still pretending we're only printing the left half of the triangle):
for line in lines:
l = len(line)
line = [" "]*(b-len(line)) + line
With each line now buffered, we'll make a new array that we will print from. By zip()ing together the line and the buffer, we can easily right justify (String.rjust()) numberic strings, expanded out to the length required.
out = []
for x,y in zip(line,buffers):
out.append(x.rjust(y))
Remmeber until now, we've still just been working with the left half of the pyramid. So we take the output array, reverse it (array[::-1]) and then take every element but the first (array[1:]) and join it all together with a string and print it out.
print(" ".join(out+out[::-1][1:]))
Voila! The completed code:
lines = []
for i in range(1,9):
lines.append([str(2**j) for j in range(i)])
b = len(lines[-1])
buffers = [len(x) for x in lines[-1]]
for line in lines:
l = len(line)
line = [" "]*(b-len(line)) + line
out = []
for x,y in zip(line,buffers):
out.append(x.rjust(y))
print(" ".join(out+out[::-1][1:]))
Output:
1
1 2 1
1 2 4 2 1
1 2 4 8 4 2 1
1 2 4 8 16 8 4 2 1
1 2 4 8 16 32 16 8 4 2 1
1 2 4 8 16 32 64 32 16 8 4 2 1
1 2 4 8 16 32 64 128 64 32 16 8 4 2 1
height = 8
maxHeight = height - 1
for i in range(height):
k, Max = 1, i * 2 + 1
print(maxHeight * " ", end="")
maxHeight -= 1
for j in range(Max):
print("%5d" % k, end="")
if (j < (Max // 2)):
k *= 2
else:
k //= 2
print()
Output:
1
1 2 1
1 2 4 2 1
1 2 4 8 4 2 1
1 2 4 8 16 8 4 2 1
1 2 4 8 16 32 16 8 4 2 1
1 2 4 8 16 32 64 32 16 8 4 2 1
1 2 4 8 16 32 64 128 64 32 16 8 4 2 1
This could be the other 9 line solution.
Generate power of two's numbers as series
Find the offset need to add in each rows
Print the empty space for the each row before printing the palindromic list.
Ie. (offset * (n - i)) times " "(empty space)
Build palindromic series by slice operation ie. temp + temp[::-1][1:]
Print the palindromic series and offset spaces relative to the length of the number you are printing.
Code:
n = 8
numbers = [2**x for x in range(n)] # Generate interseted series.
offset = len(str(numbers[-1:])) -1 # Find the max offset for the tree.
for i in range(1, n+1): # Iterate n times. 1 to n+1 helps eazy slicing.
temp = numbers[:i] # Slice series to get first row numbers.
print(' ' * (offset * (n - i)), end=" ") # Prefix spaces, multiples of offset.
for num in temp + temp[::-1][1:]: # Generate palindromic series for the row.
print(num, end=" " * (offset - len(str(num)))) # Adjust offset for the number.
print('')
output:
1
1 2 1
1 2 4 2 1
1 2 4 8 4 2 1
1 2 4 8 16 8 4 2 1
1 2 4 8 16 32 16 8 4 2 1
1 2 4 8 16 32 64 32 16 8 4 2 1
1 2 4 8 16 32 64 128 64 32 16 8 4 2 1

Categories

Resources