changing array element not work in python - python

I wrote a simple code in python for doing a sorting manually because I needed my labels to change according to the numbers too. However, swapping the elements seems to be not working. Am I forgetting something fundamental?
Here is my code:
h = np.array(prediction)
h = h.transpose()
t = np.array(Z.columns.values)
for l in range(0, h.size-3):
if h[0,l] < h[0, l+1]:
g = h[0, l]
h[0, l] = h[0, l+1]
h[0, l+1] = g
uu = t[l]
t[l] = t[l+1]
t[l+1] = uu
UPDATE 1:
Suppose that I have two arrays for demonstrating the number of cars in a parking and I want to sort it like to know which cars are more available in the parking. Both h and t are 1*N arrays. t holds the name of each car, for example, BMW', 'TOYOTA' and etc, and h holds the number of them for example 5,6 and etc. I want to sort the h array which is numbers and if an element has changed in the h, I want to change the name of relative cars in the other array too. It is a really simple thing for doing with other languages I know but because I am new in python I got into trouble. The interesting part is that code runs perfectly without error but the arrays don't change at all.

I don't fully understand your requirements, but at a glance, there are a bunch of assignments happening within that if, and none outside of it. If your arrays aren't changing my guess is that the code inside the if isn't executing, either because the test in the if is failing, or because range(0, h.size-3) is empty.
Stick some prints in there and see if that code is actually running!

Related

Converting MatLab for loop with array code to python

I was given code in Matlab made by someone else and asked to convert to python. However, I do not know MatLab.This is the code:
for i = 1:nWind
[input(a:b,:), t(a:b,1)] = EulerMethod(A(:,:,:,i),S(:,:,i),B(:,i),n,scale(:,i),tf,options);
fprintf("%d\n",i);
for j = 1:b
vwa = generate_wind([input(j,9);input(j,10)],A(:,:,:,i),S(:,:,i),B(:,i),n,scale(:,i));
wxa(j) = vwa(1);
wya(j) = vwa(2);
end
% Pick random indexes for filtered inputs
rand_index = randi(tf/0.01-1,1,filter_size);
inputf(c:d,:) = input(a+rand_index,:);
wxf(c:d,1) = wxa(1,a+rand_index);
wyf(c:d,1) = wya(1,a+rand_index);
wzf(c:d,1) = 0;
I am confused on what [input(a:b,:), t(a:b,1)] mean and if wxf, wzf, wyf are part of the MatLab library or if it's made. Also, EulerMethod and generate_wind are seprate classes. Can someone help me convert this code to python?
The only thing I really changed so far is changing the for loop from:
for i = 1:nWind
to
for i in range(1,nWind):
There's several things to unpack here.
First, MATLAB indexing is 1-based, while Python indexing is 0-based. So, your for i = 1:nWind from MATLAB should translate to for i in range(0,nWind) in Python (with the zero optional). For nWind = 5, MATLAB would produce 1,2,3,4,5 while Python range would produce 0,1,2,3,4.
Second, wxf, wyf, and wzf are local variables. MATLAB is unique in that you can assign into specific indices at the same time variables are declared. These lines are assigning the first rows of wxa and wya (since their first index is 1) into the first columns of wxf and wyf (since their second index is 1). MATLAB will also expand an array if you assign past its end.
Without seeing the rest of the code, I don't really know what c and d are doing. If c is initialized to 1 before the loop and there's something like c = d+1; later, then it would be that your variables wxf, wyf, and wzf are being initialized on the first iteration of the loop and expanded on later iterations. This is a common (if frowned upon) pattern in MATLAB. If this is the case, you'd replicate it in Python by initializing to an empty array before the loop and using the array's extend() method inside the loop (though I bet it's frowned upon in Python, as well). But really, we need you to edit your question to include a, b, c, and d if you want to be sure this is really the case.
Third, EulerMethod and generate_wind are functions, not classes. EulerMethod returns two outputs, which you'd probably replicate in Python by returning a tuple.
[input(a:b,:), t(a:b,1)] = EulerMethod(...); is assigning the two outputs of EulerMethod into specific ranges of input and t. Similar concepts as in points 1 and 2 apply here.
Those are the answers to what you expressed confusion about. Without sitting down and doing it myself, I don't have enough experience in Python to give more Python-specific recommendations.

External "noise" library only produces identical values for list elements

I am trying to teach myself how to use the Python external "noise" library that can be found on GitHub here. I'm trying to work through the tutorial on the Red Blob Games website here. However, I'm not sure how to actually make it do anything. I've read the help text that appears when I type help(noise) into the console, but there doesn't seem to be much information available.
Right now, it just prints 50 rows and columns worth of 0.0 float elements. If I change the arguments I put into noise.pnoise2(nx, ny) I can get different values, but all the values are still identical. I have checked the addresses of each row in the 2D list I create, and they aren't pointing the same place.
I am just beginning to learn about Perlin Noise, and I don't need it to actually do anything useful. I just want to see the numbers it generates.
How can I get my code to produce different float values?
import noise
height = 50
width = 50
mapList = []
for y in range(height):
row = []
for x in range(width):
nx = x/width - 0.5
ny = y/height - 0.5
row.append(noise.pnoise2(nx, ny))
mapList.append(row)
for row in mapList:
print(row)
Since you're on Python 2, the regular / division floors the answer. You'll need to use from __future__ import division to get the true decimal result when using /.

Tick tack toe: Python numbers displaying differently in different contexts for no reason

I'm not sure exactly how to describe this problem, so apologies if the title was insufficient.
I'm trying to make a piece of code that will put all possible tick-tack-toe boards (0 is blank, 1 is X, 2 is O) into a 2d array (list of lists). I've successfully found a way to do this:
import math
boxes = []
for m in range (0, 19683):
boxes.append([m%3, int(math.floor((m%9)/3))])
print(boxes)
And it works. But instead of typing out the next seven list items, I thought it would be easier to iterate over them like so:
boxes = []
for m in range (0, 19683):
boxes.append([])
for s in range (0,9):
boxes[m].append(int(math.floor((m%(3**(m+1)))/(3**m))))
print(boxes)
and it just gave me a big array of zeros! I have no idea why changing it to iteration would do this; I tried with both ** and pow(). DOes anyone know what the problem is?
Looking at your code, I think in your second example you could have meant:
boxes[m].append(int(math.floor((m%(3**(s+1)))/(3**s))))
Also, you can utilise itertools.product() to achieve the same kind of outcome:
boxes = list(product([0, 1, 2], repeat=9))
Let's look at that inner expression:
math.floor(m%(3**(m+1)))/(3**m)
The numerator is simply m: you take it modulus 3^(m+1), which will be larger than m. The next step is then m / (3**m) -- taken as an integer, this is 0.
In short, your algebra is off.
I recommend that you use itertools.product to get the output you want.

In python is it better to have a series of += or to append to a list and then sum?

Both of these bits of code do the same thing:
g = 1
g += 2
g += 17
print g
g = []
g.append(1)
g.append(2)
g.append(17)
print sum(g)
I was just wondering if one of these ways is "better" or more Python than the other.
My own testing with the following bit of code:
import time
n = 1000000
A = time.clock()
w = 0
for i in range(n):
w += i
print w, time.clock() - A
A = time.clock()
g = []
for i in range(n):
g.append( i )
print sum(g), time.clock() - A
seems to indicate that the first method is slightly faster, but I may be missing something.
Or I may be missing an entirely better way to perform this type of operation. Any input would be welcome.
It's not a matter of being Pythonic, it's a matter of what you want to achieve.
Do you want to save the values that constitute the sum so they can be referred to later? If so, use a list. If not, then why even bother with a list? It would just be a convoluted and less efficient way to do the same thing -- just add the values up. The direct addition method will obviously be faster because all you're doing is adding to a variable (very cheap), instead of mutating a list (which has a greater cost). Not to mention the evident memory advantage of using the direct addition approach, since you wouldn't be storing useless numbers.
Method A is
Add the integers
Method B is
Create a list of integers
Add the integers
If all you want to do is
Add the integers
I'd go with method A.
The only reason to have the second method is if you plan to use the list g elsewhere in the code. Otherwise, there isn't a reason to do it the second way. Making a list and then summing its values is a lot more costly then just incrementing a variable.
Moreover, if incrementing g is your goal, then why not do that? "Explicit is better than implicit" is a motto of Python. The first method explicitly increments g.
Also, the list may confuse people. When they see your code, they will think you need the list and plan to use it elsewhere. Not to mention that g is now a list. If g is supposed to be a number, having it be a list is not good and can lead to problems.
Finally, the first solution has less syntax (always a plus if it does the same job efficiently).
So, I'd go with method 1.
Absolutely the first, for many reason, first of all memory allocation (N integer instead just one) and performance: in real world application the GC overhead would pop out.
edit: disregard this, I can see now that it is not generally true, and is only true for specific values of x.
Ok, so I can see that making a list should be inefficient, but then why does fun2 run more quickly in this instance? Doesn't it essentially create a list and then sum over it?
import timeit
def fun1(x):
w = 0
for i in range(x):
w += i
return w
def fun2(x):
return sum([i for i in range(x)])
timer = timeit.Timer(stmt='fun1(10000)', setup='from __main__ import fun1')
print timer.timeit(number=10000)
timer = timeit.Timer(stmt='fun2(10000)', setup='from __main__ import fun2')
print timer.timeit(number=10000)

create an array from a txt file

I'm new in python and I have a problem.
I have some measured data saved in a txt file.
the data is separated with tabs, it has this structure:
0 0 -11.007001 -14.222319 2.336769
i have always 32 datapoints per simulation (0,1,2,...,31) and i have 300 simulations (0,1,2...,299), so the data is sorted at first with the number of simulation and then the number of the data point.
The first column is the simulation number, the second column is the data point number and the other 3 columns are the x,y,z coordinates.
I would like to create a 3d array, the first dimension should be the simulation number, the second the number of the datapoint and the third the three coordinates.
I already started a bit and here is what I have so far:
## read file
coords = [x.split('\t') for x in
open(f,'r').read().replace('\r','')[:-1].split('\n')]
## extract the information you want
simnum = [int(x[0]) for x in coords]
npts = [int(x[1]) for x in coords]
xyz = array([map(float,x[2:]) for x in coords])
but I don't know how to combine these 2 lists and this one array.
in the end i would like to have something like this:
array = [simnum][num_dat_point][xyz]
thanks for your help.
I hope you understand my problem, it's my first posting in a python forum, so if I did anything wrong, I'm sorry about this.
thanks again
you can combine them with zip function, like so:
for sim, datapoint, x, y, z in zip(simnum, npts, *xyz):
# do your thing
or you could avoid list comprehensions altogether and just iterate over the lines of the file:
for line in open(fname):
lst = line.split('\t')
sim, datapoint = int(lst[0]), int(lst[1])
x, y, z = [float(i) for i in lst[2:]]
# do your thing
to parse a single line you could (and should) do the following:
coords = [x.split('\t') for x in open(fname)]
This seems like a good opportunity to use itertools.groupby.
import itertools
import csv
file = open("data.txt")
reader = csv.reader(file, delimiter='\t')
result = []
for simnumberStr, rows in itertools.groupby(reader, key=lambda t: t[0]):
simData = []
for row in rows:
simData.append([float(v) for v in row[2:]])
result.append(simData)
file.close()
This will create a 3 dimensional list named 'result'. The first index is the simulation number, and the second index is the data index within that simulation. The value is a list of integers containing the x, y, and z coordinate.
Note that this assumes the data is already sorted on simulation number and data number.
According to the zen of python, flat is better than nested. I'd just use a dict.
import csv
f = csv.reader(open('thefile.csv'), delimiter='\t',
quoting=csv.QUOTE_NONNUMERIC)
result = {}
for simn, dpoint, c1, c2, c3 in f:
result[simn, dpoint] = c1, c2, c3
# pretty-prints the result:
from pprint import pprint
pprint(result)
You could be using many different kinds of containers for your purposes, but none of them has array as an unqualified name -- Python has a module array which you can import from the standard library, but the array.array type is too limited for your purposes (1-D only and with elementary types as contents); there's a popular third-party extension known as numpy, which does have a powerful numpy.array type, which you could use if you has downloaded and installed the extension -- but as you never even once mention numpy I doubt that's what you mean; the relevant builtin types are list and dict. I'll assume you want any container whatsoever -- but if you could learn to use precise terminology in the future, that will substantially help you AND anybody who's trying to help you (say list when you mean list, array only when you DO mean array, "container" when you're uncertain about what container to use, and so forth).
I suggest you look at the csv module in the standard library for a more robust way to reading your data, but that's a separate issue. Let's start from when you have the coords list of lists of 5 strings each, each sublist with strings representing two ints followed by three floats. Two more key aspects need to be specified...
One key aspect you don't tell us about: is the list sorted in some significant way? is there, in particular, some significant order you want to keep? As you don't even mention either issue, I will have to assume one way or another, and I'll assume that there isn't any guaranteed nor meaningful order; but, no repetition (each pair of simulation/datapoint numbers is not allowed to occur more than once).
Second key aspect: are there the same number of datapoints per simulation, in increasing order (0, 1, 2, ...), or is that not necessarily the case (and btw, are the simulation themselves numbered 0, 1, 2, ...)? Again, no clue from you on this indispensable part of the specs -- note how many assumptions you're forcing would-be helpers to make by just not telling us about such obviously crucial aspects. Don't let people who want to help you stumble in the dark: rather, learn to ask questions the smart way -- this will save untold amounts of time to yourself AND would-be helpers, and give you higher-quality and more relevant help, so, why not do it? Anyway, forced to make yet another assumption, I'll have to assume nothing at all is known about the simulation numbers nor about the numers of datapoints in each simulation.
With these assumptions dict emerges as the only sensible structure to use for the outer container: a dictionary whose key is a tuple with two items, simulation number then datapoint number within the simulation. The values may as well be tuple, too (with three floats each), since it does appear that you have exactly 3 coordinates per line.
With all of these assumptions...:
def make_container(coords):
result = dict()
for s, d, x, y, z in coords:
key = int(s), int(d)
value = float(x), float(y), float(z)
result[key] = value
return result
It's always best, and fastest, to have all significant code within def statements (i.e. as functions to be called, possibly with appropriate arguments), so I'm presenting it this way. make_container returns a dictionary which you can address with the simulation number and datapoint number; for example,
d = make_container(coords)
print d[0, 0]
will print the x, y, z for dp 0 of sim 0, assuming one exists (you would get an error if such a sim/dp combination did not exist). dicts have many useful methods, e.g. changing the print statement above to
print d.get((0, 0))
(yes, you do need double parentheses here -- inner ones to make a tuple, outer ones to call get with that tuple as its single argument), you'd see None, rather than get an exception, if there was no such sim/dp combinarion as (0, 0).
If you can edit your question to make your specs more precise (perhaps including some indication of ways you plan to use the resulting container, as well as the various key aspects I've listed above), I might well be able to fine-tune this advice to match your need and circumstances much better (and so might ever other responder, regarding their own advice!), so I strongly recommend you do so -- thanks in advance for helping us help you!-)
essentially the difficulty is what happens if different simulations have different numbers of points.
You will therefore need to dimension an array to the appropriate sizes first.
t should be an array of at least max(simnum) x max(npts) x 3.
To eliminate confusion you should initialise with not-a-number,
this will allow you to see missing points.
then use something like
for x in coords:
t[int(x[0])][int(x[1])][0]=float(x[3])
t[int(x[0])][int(x[1])][1]=float(x[4])
t[int(x[0])][int(x[1])][2]=float(x[5])
is this what you meant?
First I'd point out that your first data point appears to be an index, and wonder if the data is therefore important or not, but whichever :-)
def parse(line):
mch = re.compile('^(\d+)\s+(\d+)\s+([-\d\.]+)\s+([-\d\.]+)\s+([-\d\.]+)$')
m = mch.match(line)
if m:
l = m.groups()
(idx,data,xyz) = (int(l[0]),int(l[1]), map(float, l[2:]))
return (idx, data, xyz)
return None
finaldata = []
file = open("data.txt",'r')
for line in file:
r = parse(line)
if r is not None:
finaldata.append(r)
Final data should have output along the lines of:
[(0, 0, [-11.007001000000001, -14.222319000000001, 2.3367689999999999]),
(1, 0, [-11.007001000000001, -14.222319000000001, 2.3367689999999999]),
(2, 0, [-11.007001000000001, -14.222319000000001, 2.3367689999999999]),
(3, 0, [-11.007001000000001, -14.222319000000001, 2.3367689999999999]),
(4, 0, [-11.007001000000001, -14.222319000000001, 2.3367689999999999])]
This should be pretty robust about dealing w/ the whitespace issues (tabs spaces whatnot)...
I also wonder how big your data files are, mine are usually large so being able to process them in chunks or groups become more important... Anyway this will work in python 2.6.
Are you sure a 3d array is what you want? It seems more likely that you want a 2d array, where the simulation number is one dimension, the data point is the second, and then the value stored at that location is the coordinates.
This code will give you that.
data = []
for coord in coords:
if coord[0] not in data:
data[coord[0]] = []
data[coord[0]][coord[1]] = (coord[2], coord[3], coord[4])
To get the coordinates at simulation 7, data point 13, just do data[7][13]

Categories

Resources