I am new to Python, and am struggling with a task that I assume is an extremely simple one for an experienced programmer.
I am trying to create a list of lists of coordinates for different lines. For instance:
list = [ [(x,y), (x,y), (x,y)], [Line 2 Coordinates], ....]
I have the following code:
masterlist_x = list(range(-5,6))
oneline = []
data = []
numberoflines = list(range(2))
i = 1
for i in numberoflines:
slope = randint(-5,5)
y_int = randint(-10,10)
for element in masterlist_x:
oneline.append((element,slope * element + y_int))
data.append(oneline)
The output of the variable that should hold the coordinates to one line (oneline) holds two lines:
Output
I know this is an issue with the outer looping mechanism, but I am not sure how to proceed.
Any and all help is much appreciated. Thank you very much!
#khuynh is right, you simply had the oneline = [] in wrong place, you put all the coords in one line.
Also, you have a couple unnecessary things in your code:
you don't need list() the range(), you can just iterate them directly with for
also you don't need to declare the i for the for, it does it itself
that i is not actually used, which is fine. Python convention for unused variables is _
Fixed version:
from random import randint
masterlist_x = range(-5,6)
data = []
numberoflines = range(2)
for _ in numberoflines:
oneline = []
slope = randint(-5,5)
y_int = randint(-10,10)
for element in masterlist_x:
oneline.append((element,slope * element + y_int))
data.append(oneline)
print(data)
Also on-line there where you can run it: https://repl.it/repls/GreedyRuralProduct
I suspect the whole thing could be also made with much less code, and in a way in a simpler fashion, as a list comprehension ..
UPDATE: the inner loop is indeed very suitable for a list comprehension. Maybe the outer could be made into one as well, and the whole thing could two nested list comprehensions, but I only got confused when tried that. But this is clear:
from random import randint
masterlist_x = range(-5,6)
data = []
numberoflines = range(2)
for _ in numberoflines:
slope = randint(-5,5)
y_int = randint(-10,10)
oneline = [(element, slope * element + y_int)
for element in masterlist_x]
data.append(oneline)
print(data)
Again on repl.it too: https://repl.it/repls/SoupyIllustriousApplicationsoftware
Related
I am trying to create arrays of fixed size within a while loop. Since I do not know how many arrays I have to create, I am using a loop to initiate them within a while loop. The problem I am facing is, with the array declaration.I would like the name of each array to end with the index of the while loop, so it will be later useful for my calculations. I do not expect to find a easy way out, however it would be great if someone can point me in the right direction
I tried using arrayname + str(i). This returns the error 'Can't assign to operator'.
#parse through the Load vector sheet to load the values of the stress vector into the dataframe
Loadvector = x2.parse('Load_vector')
Lvec_rows = len(Loadvector.index)
Lvec_cols = len(Loadvector.columns)
i = 0
while i < Lvec_cols:
y_values + str(i) = np.zeros(Lvec_rows)
i = i +1
I expect arrays with names arrayname1, arrayname2 ... to be created.
I think the title is somewhat misleading.
An easy way to do this would be using a dictionary:
dict_of_array = {}
i = 0
while i < Lvec_cols:
dict_of_array[y_values + str(i)] = np.zeros(Lvec_rows)
i = i +1
and you can access arrayname1 by dict_of_array[arrayname1].
If you want to create a batch of arrays, try:
i = 0
while i < Lvec_cols:
exec('{}{} = np.zeros(Lvec_rows)'.format(y_values, i))
i = i +1
I am still new to python but using it for my linguistics research.
So I am doing some research into toponyms, and I got a list of input data from a topographic institution, which looks like the following:
Official_Name, tab, Dialect_Name, tab, Administrative_district, Topographic_district, Y_coordinates, X_coordinates, Longitude, Latitude.
So, I defined a class:
class MacroTop:
def __init__(self, Official_Name, Dialect_Name, Adm_District, Topo_District, Y, X, Long, Lat):
self.Official_Name = Official_Name
self.Dialect_Name = Dialect_Name
self.Adm_District = Adm_District
self.Topo_District = Topo_District
self.Y = Y
self.X = X
self.Long = Long
self.Lat = Lat
So, with open(), I wanted to load my .txt file with the data I have to read it into the class using a loop but it did not work.
The result I want is to be able to access a feature of the class, say, Dialect_Name and be able to look through all the entries of that feature. I can do that just in the loop, but I wanted to define a class so I could be able to do more manipulation afterwards.
my loop:
with open("locLuxAll.txt", "r") as topo_list:
lines = topo_list.readlines()
for line in lines:
line = line.split('\t')
print(line)
print(line[0]) # This would access all the data that is characterized as Official_Name
I tried to make another loop:
for i in range(0-len(lines)):
lines[i] = MacroTop(str(line[0]), str(line[1]), str(line[2]), str(line[3]), str(line[4]), str(line[5]), str(line[6]), str(line[7]))
But that did not seem to work.
This line fails:
for i in range(0-len(lines)):
You're trying to loop through negative number I guess, so the output will be an empty list.
In [11]: [i for i in range(-200)]
Out[11]: []
EDIT:
Your code seems unreadable to me, you have for i in range(len(lines)) but in this for loop, you're iterating through line variable, where is it from? First of all I'd not write back to lines list as it comes from readlines. Create new list for that, and you dont need i variable, those lines will be kept in order anyway.
class_lines = []
for line in lines:
class_lines.append(MacroTop(str(line[0]), str(line[1]), str(line[2]), str(line[3]), str(line[4]), str(line[5]), str(line[6]), str(line[7])))
Or even with list comprehension:
class_lines = [MacroTop(str(line[0]), str(line[1]), str(line[2]), str(line[3]), str(
line[4]), str(line[5]), str(line[6]), str(line[7])) for line in lines]
If I wanted to perform something like Levene's test of equal variances via scipy stats, which produces two outputs (the test statistic and p-value) for all the data in a dictionary, how would I append the outputs for each test to two different lists? I tried the code below:
test_stat[]
p_value[]
for i in range(0, n_data):
for j in range(1, n_name):
test_stat[i], p_value[i] = scipy.stats.levene(data[i][name[j-1]],
data[i][name[j]],
center='median')
But this clearly isn't the way to go about it, as I keep getting anIndexError because the list assignment index out of range.
Any suggestions would be greatly appreciated. Thanks!
Not everything needs to be in a single line... This should work fine:
test_stats = []
p_values = []
for i in range(0, n_data):
for j in range(1, n_name):
test_stat, p_value = scipy.stats.levene(data[i][name[j-1]],
data[i][name[j]],
center='median')
test_stats.append(test_stat)
p_values.append(p_value)
Though of course this will add n_data * n_name rows.
I am trying to append a lengthy list of rows to the same variable. It works great for the first thousand or so iterations in the loop (all of which have the same lengths), but then, near the end of the file, the rows get a bit shorter, and while I still want to append them, I am not sure how to handle it.
The script gives me an out of range error, as expected.
Here is what the part of code in question looks like:
ii = 0
NNCat = []
NNCatelogue = []
while ii <= len(lines):
NNCat = (ev_id[ii], nn1[ii], nn2[ii], nn3[ii], nn4[ii], nn5[ii], nn6[ii], nn7[ii], nn8[ii], nn9[ii], nn10[ii], nn11[ii])
NNCatelogue.append(NNCat)
ii = ii + 1
print NNCatelogue, ii
Any help on this would be greatly appreciated!
I'll answer the question you didn't ask first ;) : how can this code be more pythonic?
Instead of
ii = 0
NNCat = []
NNCatelogue = []
while ii <= len(lines):
NNCat = (ev_id[ii], nn1[ii], nn2[ii], nn3[ii], nn4[ii], nn5[ii], nn6[ii], nn7[ii], nn8[ii], nn9[ii], nn10[ii], nn11[ii])
NNCatelogue.append(NNCat)
ii = ii + 1
you should do
NNCat = []
NNCatelogue = []
for ii, line in enumerate(lines):
NNCat = (ev_id[ii], nn1[ii], nn2[ii], nn3[ii], nn4[ii], nn5[ii], nn6[ii],
nn7[ii], nn8[ii], nn9[ii], nn10[ii], nn11[ii])
NNCatelogue.append(NNCat)
During each pass ii will be incremented by one for you and line will be the current line.
As for your short lines, you have two choices
Use a special value (such as None) to fill in when you don't have a real value
check the length of nn1, nn2, ..., nn11 to see if they are large enough
The second solution will be much more verbose, hard to maintain, and confusing. I strongly recommend using None (or another special value you create yourself) as a placeholder when there is no data.
def gvop(vals,indx): #get values or padding
return vals[indx] if indx<len(vals) else None
NNCatelogue = [(gvop(ev_id,ii), gvop(nn1,ii), gvop(nn2,ii), gvop(nn3,ii), gvop(nn4,ii),
gvop(nn5,ii), gvop(nn6,ii), gvop(nn7,ii), gvop(nn8,ii), gvop(nn9,ii),
gvop(nn10,ii), gvop(nn11,ii)) for ii in xrange(0, len(lines))]
By defining this other function to return either the correct value or padding, you can ensure rows are the same length. You can change the padding to anything, if None is not what you want.
Then the list comp creates a list of tuples as before, except containing padding in cases where some of the lines in the input are shorter.
from itertools import izip_longest
NNCatelogue = list(izip_longest(ev_id, nn1, nn2, ... nn11, fillvalue=None))
See here for documentation of izip. Do yourself a favour and skip the list around the iterator, if you don't need it. In many cases you can use the iterator as well as the list, and you save a lot of memory. Especially if you have long lists, that you're grouping together here.
I have read this answer potentially as the best way to randomize a list of strings in Python. I'm just wondering then if that's the most efficient way to do it because I have a list of about 30 million elements via the following code:
import json
from sets import Set
from random import shuffle
a = []
for i in range(0,193):
json_data = open("C:/Twitter/user/user_" + str(i) + ".json")
data = json.load(json_data)
for j in range(0,len(data)):
a.append(data[j]['su'])
new = list(Set(a))
print "Cleaned length is: " + str(len(new))
## Take Cleaned List and Randomize it for Analysis
shuffle(new)
If there is a more efficient way to do it, I'd greatly appreciate any advice on how to do it.
Thanks,
A couple of possible suggestions:
import json
from random import shuffle
a = set()
for i in range(193):
with open("C:/Twitter/user/user_{0}.json".format(i)) as json_data:
data = json.load(json_data)
a.update(d['su'] for d in data)
print("Cleaned length is {0}".format(len(a)))
# Take Cleaned List and Randomize it for Analysis
new = list(a)
shuffle(new)
.
the only way to know if this is faster is to profile it!
do you prefer sets.Set to the built-in set() for a reason?
I have introduced a with clause (preferred way of opening files, as it guarantees they get closed)
it did not appear that you were doing anything with 'a' as a list except converting it to a set; why not make it a set from the start?
rather than iterate on an index, then do a lookup on the index, I just iterate on the data items...
which makes it easily rewriteable as a generator expression
If you think you're going to do shuffle, you're probably better off using the solution from this file. For realz.
randomly mix lines of 3 million-line file
Basically the shuffle algorithm has a very low period (meaning it can't hit all the possible combinations of 3 million files, let alone 30 million). If you can load the data in memory then your best bet is as they say. Basically assign a random number to each line and sort that badboy.
See this thread. And here, I did it for you so you didn't mess anything up (that's a joke),
import json
import random
from operator import itemgetter
a = set()
for i in range(0,193):
json_data = open("C:/Twitter/user/user_" + str(i) + ".json")
data = json.load(json_data)
a.update(d['su'] for d in data)
print "Cleaned length is: " + str(len(new))
new = [(random.random(), el) for el in a]
new.sort()
new = map(itemgetter(1), new)
I don't know if it will be any faster but you could try numpy's shuffle.