How to make a list of lists in python - python

I am trying to make a list of lists in python using random.random().
def takeStep(prevPosition, maxStep):
"""simulates taking a step between positive and negative maxStep, \
adds it to prevPosition and returns next position"""
nextPosition = prevPosition + (-maxStep + \
( maxStep - (-maxStep)) * random.random())
list500Steps = []
list1000Walks = []
for kk in range(0,1000):
list1000Walks.append(list500Steps)
for jj in range(0 , 500):
list500Steps.append(list500Steps[-1] + takeStep(0 , MAX_STEP_SIZE))
I know why this gives me what it does, just don't know how to do something about it. Please give the simplest answer, in new at this and don't know a lot yet.

for kk in xrange(0,1000):
list500steps = []
for jj in range(0,500):
list500steps.append(...)
list1000walks.append(list500steps)
Notice how I am creating an empty array (list500steps) each time in the first for loop? Then, after creating all the steps I append that array (Which is now NOT empty) to the array of walks.

import random
def takeStep(prevPosition, maxStep):
"""simulates taking a step between positive and negative maxStep, \
adds it to prevPosition and returns next position"""
nextPosition = prevPosition + (-maxStep + \
( maxStep - (-maxStep)) * random.random())
return nextPosition # You didn't have this, I'm not exactly sure what you were going for #but I think this is it
#Without this statement it will repeatedly crash
list500Steps = [0]
list1000Walks = [0]
#The zeros are place holders so for the for loop (jj) below. That way
#The first time it goes through the for loop it has something to index-
#during "list500Steps.append(list500Steps[-1] <-- that will draw an eror without anything
#in the loops. I don't know if that was your problem but it isn't good either way
for kk in range(0,1000):
list1000Walks.append(list500Steps)
for jj in range(0 , 500):
list500Steps.append(list500Steps[-1] + takeStep(0 , MAX_STEP_SIZE))
#I hope for MAX_STEP_SIZE you intend on (a) defining the variable (b) inputing in a number
You might want to print one of the lists to check the input.

Related

Walk along 2D numpy array as long as values remain the same

Short description
I want to walk along a numpy 2D array starting from different points in specified directions (either 1 or -1) until a column changes (see below)
Current code
First let's generate a dataset:
# Generate big random dataset
# first column is id and second one is a number
np.random.seed(123)
c1 = np.random.randint(0,100,size = 1000000)
c2 = np.random.randint(0,20,size = 1000000)
c3 = np.random.choice([1,-1],1000000 )
m = np.vstack((c1, c2, c3)).T
m = m[m[:,0].argsort()]
Then I wrote the following code that starts at specific rows in the matrix (start_points) then keeps extending in the specified direction (direction_array) until the metadata changes:
def walk(mat, start_array):
start_mat = mat[start_array]
metadata = start_mat[:,1]
direction_array = start_mat[:,2]
walk_array = start_array
while True:
walk_array = np.add(walk_array, direction_array)
try:
walk_mat = mat[walk_array]
walk_metadata = walk_mat[:,1]
if sorted(metadata) != sorted(walk_metadata):
raise IndexError
except IndexError:
return start_mat, mat[walk_array + (direction_array *-1)]
s = time.time()
for i in range(100000):
start_points = np.random.randint(0,1000000,size = 3)
res = walk(m, start_points)
Question
While the above code works fine I think there must be an easier/more elegant way to walk along a numpy 2D array from different start points until the value of another column changes? This for example requires me to slice the input array for every step in the while loop which seems quite inefficient (especially when I have to run walk millions of times).
You don't have to whole input array in while loop. You could just use the column that values you want to check.
I refactored a little bit your code as well so there is no while True statement and so there is no if that raises error for no particular reason.
Code:
def walk(mat, start_array):
start_mat = mat[start_array]
metadata = sorted(start_mat[:,1])
direction_array = start_mat[:,2]
data = mat[:,1]
walk_array = np.add(start_array, direction_array)
try:
while metadata == sorted(data[walk_array]):
walk_array = np.add(walk_array, direction_array)
except IndexError:
pass
return start_mat, mat[walk_array - direction_array]
In this particular reason if len(start_array) is a big number (thousands of elements) you could use collections.Counter instead of sort as it will be much faster.
I was thinking of another approach that could be used and I that there could be a array with desired slices in correct direction.
But this approach seems very dirty. Anyway I will post it maybe you will find it anyhow useful
Code:
def walk(mat, start_array):
start_mat = mat[start_array]
metadata = sorted(start_mat[:,1])
direction_array = start_mat[:,2]
_data = mat[:,1]
walk_slices = zip(*[
data[start_points[i]+direction_array[i]::direction_array[i]]
for i in range(len(start_points))
])
for step, walk_metadata in enumerate(walk_slices):
if metadata != sorted(walk_metadata):
break
return start_mat, mat[start_array + (direction_array * step)]
To perform operation starting from a single row, define the following class:
class Walker:
def __init__(self, tbl, row):
self.tbl = tbl
self.row = row
self.dir = self.tbl[self.row, 2]
# How many rows can I move from "row" in the indicated direction
# while metadata doesn't change
def numEq(self):
# Metadata from "row" in the required direction
md = self.tbl[self.row::self.dir, 1]
return ((md != md[0]).cumsum() == 0).sum() - 1
# Get row "n" positions from "row" in the indicated direction
def getRow(self, n):
return self.tbl[self.row + n * self.dir]
Then, to get the result, run:
def walk_2(m, start_points):
# Create walkers for each starting point
wlk = [ Walker(m, n) for n in start_points ]
# How many rows can I move
dist = min([ w.numEq() for w in wlk ])
# Return rows from changed positions
return np.vstack([ w.getRow(dist) for w in wlk ])
The execution time of my code is roughly the same as yours,
but in my opinion my code is more readable and concise.

Python For loop not incrementing

clean_offset = len(malware)
tuple_clean = []
tuple_malware = []
for i in malware:
tuple_malware.append([malware.index(i), 0])
print(malware.index(i))
print(tuple_malware)
for j in clean:
tuple_clean.append([(clean_offset + clean.index(j)), 1])
print(clean.index(j))
print(tuple_clean)
import pdb; pdb.set_trace()
training_data_size_mal = 0.8 * len(malware)
training_data_size_clean = 0.8 * len(clean)
i increments as normal and produces correct output however j remains at 0 for three loops and then jumps to 3. I don't understand this.
There is a logical error on clean.index(j).
Array.index will return the first matched index in that array.
So if there are some equal variables there will be some error
You can inspect with below code.
malware = [1,2,3,4,5,6,7,8,8,8,8,8,2]
clean = [1,2,3,4,4,4,4,4,4,2,4,4,4,4]
clean_offset = len(malware)
tuple_clean = []
tuple_malware = []
for i in malware:
tuple_malware.append([malware.index(i), 0])
print(malware.index(i))
print(tuple_malware)
for j in clean:
tuple_clean.append([(clean_offset + clean.index(j)), 1])
print(clean.index(j))
print(tuple_clean)
training_data_size_mal = 0.8 * len(malware)
training_data_size_clean = 0.8 * len(clean)
for a in something
a is what is contained in something, not the index
for example:
for n in [1, 10, 9, 3]:
print(n)
gives
1
10
9
3
You either want
for i in range(len(malware))
or
for i, element in enumerate(malware)
at which point the i is the count and the element in the malware.index(i)
The last one is considered best practice when needing both the index and the element at that index in the loop.
op has already figured the question, but in case anyone is wondering or needs a TL;DR of Barkin's comment, its just a small correction,
replace
for i in malware
for j in clean
with
for i in range(len(malware))
for j in range(len(clean))
and at the end remove the .index() function, and place i and j.

More on dynamic programming

Two weeks ago I posted THIS question here about dynamic programming. User Andrea Corbellini answered precisely what I wanted, but I wanted to take the problem one more step further.
This is my function
def Opt(n):
if len(n) == 1:
return 0
else:
return sum(n) + min(Opt(n[:i]) + Opt(n[i:])
for i in range(1, len(n)))
Let's say you would call
Opt( [ 1,2,3,4,5 ] )
The previous question solved the problem of computing the optimal value. Now,
instead of the computing the optimum value 33 for the above example, I want to print the way we got to the most optimal solution (path to the optimal solution). So, I want to print the indices where the list got cut/divided to get to the optimal solution in the form of a list. So, the answer to the above example would be :
[ 3,2,1,4 ] ( Cut the pole/list at third marker/index, then after second index, then after first index and lastly at fourth index).
That is the answer should be in the form of a list. The first element of the list will be the index where the first cut/division of the list should happen in the optimal path. The second element will be the second cut/division of the list and so on.
There can also be a different solution:
[ 3,4,2,1 ]
They both would still lead you to the correct output. So, it doesn't matter which one you printed. But, I have no idea how to trace and print the optimal path taken by the Dynamic Programming solution.
By the way, I figured out a non-recursive solution to that problem that was solved in my previous question. But, I still can't figure out to print the path for the optimal solution. Here is the non-recursive code for the previous question, it might be helpful to solve the current problem.
def Opt(numbers):
prefix = [0]
for i in range(1,len(numbers)+1):
prefix.append(prefix[i-1]+numbers[i-1])
results = [[]]
for i in range(0,len(numbers)):
results[0].append(0)
for i in range(1,len(numbers)):
results.append([])
for j in range(0,len(numbers)):
results[i].append([])
for i in range(2,len(numbers)+1): # for all lenghts (of by 1)
for j in range(0,len(numbers)-i+1): # for all beginning
results[i-1][j] = results[0][j]+results[i-2][j+1]+prefix[j+i]-prefix[j]
for k in range(1,i-1): # for all splits
if results[k][j]+results[i-2-k][j+k+1]+prefix[j+i]-prefix[j] < results[i-1][j]:
results[i-1][j] = results[k][j]+results[i-2-k][j+k+1]+prefix[j+i]-prefix[j]
return results[len(numbers)-1][0]
Here is one way of printing the selected :
I used the recursive solution using memoization provided by #Andrea Corbellini in your previous question. This is shown below:
cache = {}
def Opt(n):
# tuple objects are hashable and can be put in the cache.
n = tuple(n)
if n in cache:
return cache[n]
if len(n) == 1:
result = 0
else:
result = sum(n) + min(Opt(n[:i]) + Opt(n[i:])
for i in range(1, len(n)))
cache[n] = result
return result
Now, we have the cache values for all the tuples including the selected ones.
Using this, we can print the selected tuples as shown below:
selectedList = []
def printSelected (n, low):
if len(n) == 1:
# No need to print because it's
# already printed at previous recursion level.
return
minVal = math.Inf
minTupleLeft = ()
minTupleRight = ()
splitI = 0
for i in range(1, len(n)):
tuple1ToI = tuple (n[:i])
tupleiToN = tuple (n[i:])
if (cache[tuple1ToI] + cache[tupleiToN]) < minVal:
minVal = cache[tuple1ToI] + cache[tupleiToN]
minTupleLeft = tuple1ToI
minTupleRight = tupleiToN
splitI = low + i
print minTupleLeft, minTupleRight, minVal
print splitI # OP just wants the split index 'i'.
selectedList.append(splitI) # or add to the list as requested by OP
printSelected (list(minTupleLeft), low)
printSelected (list(minTupleRight), splitI)
You call the above method like shown below:
printSelected (n, 0)

Python 'for' loop issue, wht are these two variables not adding together properly in my 'for' loop?

I am writing a code snippet for a random algebraic equation generator for a larger project. Up to this point, everything has worked well. The main issue is simple. I combined the contents of a dictionary in sequential order. So for sake of argument, say the dictionary is: exdict = {a:1 , b:2 , c:3 , d:4}, I append those to a list as such: exlist = [a, b, c, d, 1, 2, 3, 4]. The length of my list is 8, which half of that is obviously 4. The algorithm is quite simple, whatever random number is generated between 1-4(or as python knows as 0-3 index), if you add half of the length of the list to that index value, you will have the correct value.
I have done research online and on stackoverflow but cannot find any answer that I can apply to my situation...
Below is the bug check version of my code. It prints out each variable as it happens. The issue I am having is towards the bottom, under the ### ITERATIONS & SETUP comment. The rest of the code is there so it can be ran properly. The primary issue is that a + x should be m, but a + x never equals m, m is always tragically lower.
Bug check code:
from random import randint as ri
from random import shuffle as sh
#def randomassortment():
letterss = ['a','b','x','d','x','f','u','h','i','x','k','l','m','z','y','x']
rndmletters = letterss[ri(1,15)]
global newdict
newdict = {}
numberss = []
for x in range(1,20):
#range defines max number in equation
numberss.append(ri(1,20))
for x in range(1,20):
rndmnumber = numberss[ri(1,18)]
rndmletters = letterss[ri(1,15)]
newdict[rndmletters] = rndmnumber
#x = randomassortment()
#print x[]
z = []
# set variable letter : values in list
for a in newdict.keys():
z.append(a)
for b in newdict.values():
z.append(b)
x = len(z)/2
test = len(z)
print 'x is value %d' % (x)
### ITERATIONS & SETUP
iteration = ri(2,6)
for x in range(1,iteration):
a = ri(1,x)
m = a + x
print 'a is value: %d' % (a)
print 'm is value %d' %(m)
print
variableletter = z[a]
variablevalue = z[m]
# variableletter , variablevalue
edit - My questions is ultimately, why is a + x returning a value that isn't a + x. If you run this code, it will print x , a , and m. m is supposed to be the value of a + x, but for some reason, it isnt?
The reason this isn't working as you expect is that your variable x originally means the length of the list, but it's replaced in your for x in range loop- and then you expect it to be equal to the length of the list. You could just change the line to
for i in range(iteration)
instead.
Also note that you could replace all the code in the for loop with
variableletter, variablevalue = random.choice(newdict.items())
Your problem is scope
which x are you looking for here
x = len(z)/2 # This is the first x
print 'x is value %d' % (x)
### ITERATIONS & SETUP
iteration = ri(2,6)
# x in the for loop is referencing the x in range...
for x in range(1,iteration):
a = ri(1,x)
m = a + x

Better looping, for string manipulation (python)

If i have this code
s = 'abcdefghi'
for grp in (s[:3],s[3:6],s[6:]):
print "'%s'"%(grp)
total = calc_total(grp)
if (grp==s[:3]):
# more code than this
p = total + random_value
x1 = my_function(p)
if (grp==s[3:6]):
# more code than this
p = total + x1
x2 = my_function(p)
if (grp==s[6:]):
# more code than this
p = total + x2
x3 = my_function(p)
If the group is the first group, perform code for this group, if the group is the second group, perform code using the a value generated from code performed for the first group, the same applies for the third group, using a generated value from code for the second group:
How can i tidy this up to use better looping?
Thanks
I may have misunderstood what you're doing, but it appears that you want to do something to s[:3] on the first iteration, something different to s[3:6] on the second, and something else again to s[6:] on the third. In other words, that isn't a loop at all! Just write those three blocks of code out one after another, with s[:3] and so on in place of grp.
I must say I agree with Peter in that the loop is redundant. If you are afraid of duplicating code, then just move the repeating code into a function and call it multiple times:
s = 'abcdefghi'
def foo(grp):
# Anything more you would like to happen over and over again
print "'%s'"%(grp)
return calc_total(grp)
def bar(grp, value):
total = foo(grp)
# more code than this
return my_function(total + value)
x1 = bar(s[:3], random_value)
x2 = bar(s[3:6], x1)
x3 = bar(s[6:], x2)
If
# more code than this
contains non-duplicate code, then you must of course move that out of "bar" (which together with "foo" should be given a more descriptive name).
I'd code something like this as follows:
for i, grp in enumerate((s[:3],s[3:6],s[6:])):
print "'%s'"%(grp)
total = calc_total(grp)
# more code that needs to happen every time
if i == 0:
# code that needs to happen only the first time
elif i == 1:
# code that needs to happen only the second time
etc. The == checks can be misleading if one of the groups "just happens" to be the same as another one, while the enumerate approach runs no such risk.
x = reduce(lambda x, grp: my_function(calc_total(list(grp)) + x),
map(None, *[iter(s)] * 3), random_value)
At the end, you'll have the last x.
Or, if you want to keep the intermediary results around,
x = []
for grp in map(None, *[iter(s)] * 3):
x.append(my_function(calc_total(list(grp)) + (x or [random_value])[-1]))
Then you have x[0], x[1], x[2].
Get your data into the list you want, then try the following:
output = 0
seed = get_random_number()
for group in input_list:
total = get_total(group)
p = total + seed
seed = my_function(p)
input_list will need to look like ['abc', 'def', 'ghi']. But if you want to extend it to ['abc','def','ghi','jkl','mno','pqr'], this should still work.

Categories

Resources