How to generate a list of numbers in python - python

guys. I am now working on a python algorithm and I am new to python. I'd like to generate a list of numbers like 4, 7, 8, 11, 12, 13, 16, 17, 18, 19, 22, 23, 24, 25... with 2 for loops.
I've done some work to find some numbers and I am close to the result I want, which is generate a list contains this numbers
My code is here:
for x in range(0, 6, 1):
start_ind = int(((x+3) * (x+2)) / 2 + 1)
print("start index is ", [start_ind], x)
start_node = node[start_ind]
for y in range(0, x):
ind = start_ind + y + 1
ind_list = node[ind]
index = [ind_list]
print(index)
Node is a list:
node = ['n%d' % i for i in range(0, 36, 1)]
What I received from this code is:
start index is [7] 1
['n8']
start index is [11] 2
['n12']
['n13']
start index is [16] 3
['n17']
['n18']
['n19']
start index is [22] 4
['n23']
['n24']
['n25']
['n26']
start index is [29] 5
['n30']
['n31']
['n32']
['n33']
['n34']

This seems to give the same list: and I think it's much clearer what's happening!
val=4
result=[]
for i in range(1,7):
for j in range(val,val+i):
val = val+1
result.append(j)
val = j+3
print(result)

Do not think you need a loop for this, let alone two:
import numpy as np
dif = np.ones(100, dtype = np.int32)
dif[np.cumsum(np.arange(14))] = 3
(1+np.cumsum(dif)).tolist()
output
[4, 7, 8, 11, 12, 13, 16, 17, 18, 19, 22, 23, 24, 25, 26, 29, 30, 31, 32, 33, 34, 37, 38, 39, 40, 41, 42, 43, 46, 47, 48, 49, 50, 51, 52, 53, 56, 57, 58, 59, 60, 61, 62, 63, 64, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 121, 122, 123, 124, 125, 126, 127, 128, 129]

ind_list = []
start_ind = 4
for x in range(0, 6):
ind_list.append(start_ind)
for y in range(1, x+1):
ind_list.append(start_ind + y)
start_ind = ind_list[len(ind_list)-1]+3
print(ind_list)
You could probably use this. the print function works fine, the list I assume works fairly well for the numbers provided. It appends the new number at the beginning of the loop, with a cotinually longer loop each time for x. I'm assuming the number sequence is 4, 4+3, 4+3+1, 4+3+1+3, 4+3+1+3+1, 4+3+1+3+1+1, 4+3+1+3+1+1+3, ....

Related

given an random 100 number with duplicate. I want to count how many number is inside an interval of number in python

For Example
first_interval = [40, 50, 60, 70, 80, 90]
second_interval = [49, 59, 69, 79, 89, 99]
Data = [40, 42, 47, 49, 50, 52, 55, 56, 57, 59, 60, 61, 63, 65, 65, 65, 66, 68, 68, 69, 72, 74, 78, 79, 81, 85, 87, 88, 90, 98]
x = first_interval[0] <= data <= second_interval[0]
y = first_interval[1] <= data <= second_intercal[1] # and so on
I want to know how many numbers from data is between 40-49, 50-59, 60-69 and so on
frequency = [4, 6] # 4 is x and 6 is y
Iterate on the bounds using zip, then with a list comprehension you can filter the correct values
first_interval = [40, 50, 60, 70, 80, 90]
second_interval = [49, 59, 69, 79, 89, 99]
data = [40, 42, 47, 49, 50, 52, 55, 56, 57, 59, 60, 61, 63, 65, 65,
65, 66, 68, 68, 69, 72, 74, 78, 79, 81, 85, 87, 88, 90, 98]
result = {}
for start, end in zip(first_interval, second_interval):
result[(start, end)] = len([v for v in data if start <= v <= end])
print(result)
# {(40, 49): 4, (50, 59): 6, (60, 69): 10, (70, 79): 4, (80, 89): 4, (90, 99): 2}
print(result[(40, 49)])
# 4
The version with a list and len is easier to understand
result[(start, end)] = len([v for v in data if start <= v <= end])
But the following version would be more performant for bigger size, as it's a generator, it won't have to build the whole list to just forget it after
result[(start, end)] = sum((1 for v in data if start <= v <= end))
Another version, that doesn't use the predefined bounds, and so is much performant as it's complexity is O(n) and not O(n*m) as the first one : you iterate once on values, not on values for each bounds
result = defaultdict(int) # from collections import defaultdict
for value in data:
start = 10 * (value // 10)
result[(start, start + 9)] += 1
This may help you :
first_interval = [40, 50, 60, 70, 80, 90]
second_interval = [49, 59, 69, 79, 89, 99]
Data = [40, 42, 47, 49, 50, 52, 55, 56, 57, 59, 60, 61, 63, 65, 65, 65, 66, 68, 68, 69, 72, 74, 78, 79, 81, 85, 87, 88, 90, 98]
def find_occurence(start,end,data):
counter = 0
for i in data :
if start<=i<=end :
counter += 1
return counter
print(find_occurence(first_interval[0],second_interval[0],Data)) #this gives you the anser for x and the same thing for y
Note : start :means from where you want to start.
end : mean where you want to stop.
We can use numpy.histogram with bins defined by:
first_interval bins, but open on the right
max(second_interval) to determine the close of rightmost bin
Code
# Generate counts and bins (right most edge given by max(second_interval))
frequency, bins = np.histogram(data, bins = first_interval + [max(second_interval)])
# Show Results
for i in range(len(frequency)):
if i < len(frequency) - 1:
print(f'{bins[i]}-{bins[i+1]-1} : {frequency[i]}') # frequency doesn't include right edge
else:
print(f'{bins[i]}-{bins[i+1]} : {frequency[i]}') # frequency includes right edge in last bin
Output
40-49 : 4
50-59 : 6
60-69 : 10
70-79 : 4
80-89 : 4
90-99 : 2

in python language create several variables in a loop and after append items in each ones

I'm trying to find a way to append items in variables created on the fly
c = ('a','b','g','d','j')
p = 2
for r in c:
globals()['ssvar%s' % r] = []
for z in range (0,10,1):
for r in c:
p = p + 2
(['ssvar%s' % r]).append (p)
print ssvara #result >>> []
print ssvarb #result >>> []
print ssvarg #result >>> []
print ssvard #result >>> []
print ssvarj #result >>> []
but the expression (['ssvar%s' % poire]).append doesn't work.
can you direct me to the same topic or tell me how to vary the variable name to be fill ?
Don't do this, but I think what you were looking to do is
c = ('a','b','g','d','j')
p = 2
for r in c:
globals()['ssvar%s' % r] = []
for z in range (0, 10, 1):
for r in c:
p = p + 2
globals()['ssvar%s' % r].append(p)
Instead, you can create your own dictionary (container of key: value pairs) and store the lists in there as values and use the keys as names. If this dictionary is called my_dict, then my_dict['ssvara'] references the list corresponding to 'ssvara', my_dict['ssvarb'] references the list corresponding to 'ssvarb' and so on.
c = ('a','b','g','d','j')
p = 2
my_dict = {}
for r in c:
my_dict['ssvar%s' % r] = []
for z in range (0, 10, 1):
for r in c:
p = p + 2
my_dict['ssvar%s' % r].append(p)
print my_dict
Output
{'ssvara': [4, 14, 24, 34, 44, 54, 64, 74, 84, 94],
'ssvarb': [6, 16, 26, 36, 46, 56, 66, 76, 86, 96],
'ssvard': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
'ssvarg': [8, 18, 28, 38, 48, 58, 68, 78, 88, 98],
'ssvarj': [12, 22, 32, 42, 52, 62, 72, 82, 92, 102]}
If the actual names are not important (you are, after all, creating them dynamically), you can just create a list of lists. If this list is called my_list, my_list[0] references the first sublist, my_list[1] references the second, and so on.
c = ('a','b','g','d','j')
p = 2
my_list = [[] for i in range(len(c))]
for z in range (0, 10, 1):
for l in my_list:
p = p + 2
l.append(p)
print my_list
Output
[[4, 14, 24, 34, 44, 54, 64, 74, 84, 94],
[6, 16, 26, 36, 46, 56, 66, 76, 86, 96],
[8, 18, 28, 38, 48, 58, 68, 78, 88, 98],
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
[12, 22, 32, 42, 52, 62, 72, 82, 92, 102]]
I don't use Python 2 so I had to make a few interpolations.
I try to create temporaly variables with multiple items inside.
in fact I need to have, for exemple, this result:
ssvara
>>>> ['4','6','8','10','12','14','16', '8','20','22']
ssvara
>>>> ['24','26','28','30','32','34','36', '38','40','42']
the value of p is not important, the main thing is to be able to append. even with a local variable.

How to get all values possible from a list of number without itertools

I want to create an algorithm that find all values that can be created with the 4 basic operations + - * / from a list of number n, where 2 <= len(l) <= 6 and n >= 1
All numbers must be integers.
I have seen a lot of similar topics but I don't want to use the itertool method, I want to understand why my recursive program doesn't work
I tried to make a costly recursive program that makes an exhaustive search of all the possible combinations, like a tree with n=len(l) start and each tree depth is n.
L list of the starting number
C the current value
M the list of all possible values
My code:
def result(L,C,M):
if len(L)>0:
for i in range(len(L)) :
a=L[i]
if C>=a:
l=deepcopy(L)
l.remove(a)
m=[] # new current values
#+
m.append(C+a)
# * 1 is useless
if C !=1 or a !=1:
m.append(C*a)
# must be integer
if C%a==0 and a<=C: # a can't be ==0
m.append(C//a)
#0 is useless
if C!=a:
m.append(C-a)
for r in m: #update all values possible
if r not in M:
M.append(r)
for r in m: # call the fucntion again with new current values,and updated list of remaining number
result(l,r,M)
def values_possible(L) :
m=[]
for i in L:
l=deepcopy(L)
l.remove(i)
result(l,i,m)
m.sort()
return m
For small lists without duplicate numbers, my algorithm seems to work but with lists like [1,1,2,2,4,5] it misses some values.
It returns:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,
62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 94, 95, 96, 97, 98, 99, 100, 101,
102, 104, 105, 110, 112, 115, 116, 118, 119, 120, 121, 122, 124, 125, 128, 130,
140, 160]
but it misses 93,108,114,117,123,126,132,135,150,180.
Let's take an even simpler example: [1, 1, 2, 2].
One of the numbers your algorithm can't find is 9 = (1 + 2) * (1 + 2).
Your algorithm simply cannot come up with this computation because it always deals with a "current" value C. You can start with C = 1 + 2, but you cannot find the next 1 + 2 because it has to be constructed separately.
So your recursion will have to do at least some kind of partitioning into two groups, finding all the answers for those and then doing combining them.
Something like this could work:
def partitions(L):
if not L:
yield ([], [])
else:
for l, r in partitions(L[1:]):
yield [L[0]] + l, r
yield l, [L[0]] + r
def values_possible(L):
if len(L) == 1:
return L
results = set()
for a, b in partitions(L):
if not a or not b:
continue
for va in values_possible(a):
for vb in values_possible(b):
results.add(va + vb)
results.add(va * vb)
if va > vb:
results.add(va - vb)
if va % vb == 0:
results.add(va // vb)
return results
Not too efficient though.

Remove elements from array updating list of stored indexes accordingly

Consider a numpy array of the form:
> a = np.random.uniform(0., 100., (10, 1000))
and a list of indexes to elements in that array that I want to keep track of:
> idx_s = [0, 5, 7, 9, 12, 17, 19, 32, 33, 35, 36, 39, 40, 41, 42, 45, 47, 51, 53, 57, 59, 60, 61, 62, 63, 65, 66, 70, 71, 73, 75, 81, 83, 85, 87, 88, 89, 90, 91, 93, 94, 96, 98, 100, 106, 107, 108, 118, 119, 121, 124, 126, 127, 128, 129, 133, 135, 138, 142, 143, 144, 146, 147, 150]
I also have a list of indexes of elements I need to remove from a:
> idx_d = [4, 12, 18, 20, 21, 22, 26, 28, 29, 31, 37, 43, 48, 54, 58, 74, 80, 86, 99, 109, 110, 113, 117, 134, 139, 140, 141, 148, 154, 156, 160, 166, 169, 175, 183, 194, 198, 199, 219, 220, 237, 239, 241, 250]
which I delete with:
> a_d = np.delete(arr, idx_d, axis=1)
But this process alters the indexes of elements in a_d. The indexes in idx_s no longer point in a_d to the same elements in a, since np.delete() moved them. For example: if I delete the element of index 4 from a, then all indexes after 4 in idx_s are now displaced by 1 to the right in a_d.
v Index 5 points to 'f' in a
0 1 2 3 4 5 6
a -> a b c d e f g ... # Remove 4th element 'e' from a
a_d -> a b c d f g h ... # Now index 5 no longer points to 'f' in a_d, but to 'g'
0 1 2 3 4 5 6
How do I update the idx_s list of indexes, so that the same elements that were pointed in a are pointed in a_d?
In the case of an element that is present in idx_s that is also present in idx_d (and thus removed from a and not present in a_d) its index should also be discarded.
You could use np.searchsorted to get the shifts for each element in idx_s and then simply subtract those from idx_s for the new shifted-down values, like so -
idx_s - np.searchsorted(idx_d, idx_s)
If idx_d is not already sorted, we need to feed in a sorted version. Thus, for simplicity assuming these as arrays, we would have -
idx_s = idx_s[~np.in1d(idx_s, idx_d)]
out = idx_s - np.searchsorted(np.sort(idx_d), idx_s)
A sample run to help out getting a better picture -
In [530]: idx_s
Out[530]: array([19, 5, 17, 9, 12, 7, 0])
In [531]: idx_d
Out[531]: array([12, 4, 18])
In [532]: idx_s = idx_s[~np.in1d(idx_s, idx_d)] # Remove matching ones
In [533]: idx_s
Out[533]: array([19, 5, 17, 9, 7, 0])
In [534]: idx_s - np.searchsorted(np.sort(idx_d), idx_s) # Updated idx_s
Out[534]: array([16, 4, 15, 8, 6, 0])
idx_s = [0, 5, 7, 9, 12, 17, 19]
idx_d = [4, 12, 18]
def worker(a, v, i=0):
if not a:
return []
elif not v:
return []
elif a[0] == v[0]:
return worker(a[1:], v[1:], i+1)
elif a[0] < v[0]:
return [a[0]-i] + worker(a[1:], v, i)
else:
return [a[0]-i-1] + worker(a[1:], v[1:], i+1)
worker(idx_s, idx_d)
# [0, 5, 6, 8, 15, 16]

slicing data in pandas

This is likely a really simple question, but it's one I've been confused about and stuck on for a while, so I'm hoping I might get some help.
I'm using cross validation to test my data set, but I'm finding that indexing the pandas df is not working as I'm expecting. Specifically, when I print out x_test, I find that there are no data points for x_test. In fact, there are indexes but no columns.
k = 10
N = len(df)
n = N/k + 1
for i in range(k):
print i*n, i*n+n
x_train = df.iloc[i*n: i*n+n]
y_train = df.iloc[i*n: i*n+n]
x_test = df.iloc[0:i*n, i*n+n:-1]
print x_test
Typical output:
0 751
Empty DataFrame
Columns: []
Index: []
751 1502
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...]
I'm trying to work out how to get the data to show up. Any thoughts?
Why don't you use sklearn.cross_validation.KFold? There is a clear example on this site...
UPDATE:
At all subsets you have to specify columns as well: at x_train and x_test you have to exclude target column, at y_train only the target column have to be present. See slicing and indexing for more details.
target = 'target' # name of target column
list_features = df.columns.tolist() # use all columns at model training
list_features.remove(target) # excluding "target" column
k = 10
N = len(df)
n = int(N/k) + 1 # 'int()' is necessary at Python 3
for i in range(k):
print i*n, i*n+n
x_train = df.loc[i*n: i*n+n-1, list_features] # '.loc[]' is inclusive, that's why "-1" is present
y_train = df.loc[i*n: i*n+n-1, target] # specify columns after ","
x_test = df.loc[~df.index.isin(range(int(i*n), int(i*n+n))), list_features]
print x_test

Categories

Resources