I am new to loops, and I am trying to iterate over all items in a list, and I need to generate the values between 0 and 2 with a given step value. I have tried to use the "range" function, but cannot get it to work.
The end result should look something like this (doesn't have to be in a pandas dataframe, just for illustrative purposes):
import pandas as pd
import numpy as np
data = {'range_0.5' : [0,0.5,1,1.5,2, np.nan, np.nan, np.nan, np.nan],
'range_0.25' : [0,0.25,0.5,0.75,1,1.25,1.5,1.75,2]}
df = pd.DataFrame(data)
df
Here is what I have tried:
import numpy
x = []
seq = [0.5, 0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625]
for i in seq:
x = range(0, 2, i)
The following error is thrown:
TypeError Traceback (most recent call last)
Input In [10], in <cell line: 1>()
1 for i in seq:
----> 2 x = range(0, 2, i)
TypeError: 'float' object cannot be interpreted as an integer
How can I properly create my loop?
np.arange()
You can use numpy.arange() which supports floats as step values.
import numpy as np
for step in [0.5, 0.25]:
print([i for i in np.arange(0, 2, step))
Expected output:
[0.0, 0.5, 1.0, 1.5]
[0.0, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75]
To include 2 just add the step value once again:
for step in [0.5, 0.25]:
print([i for i in np.arange(0, 2 + step, step)])
Expected output:
[0.0, 0.5, 1.0, 1.5, 2.0]
[0.0, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0]
np.linspace()
Alternatively you can use np.linspace():
This has the ability to include the endpoint using endpoint=True;
for step in [0.5, 0.25]:
print([i for i in np.linspace(0, 2, int(2 // step) + 1, endpoint=True)])
Expected output:
[0.0, 0.5, 1.0, 1.5, 2.0]
[0.0, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0]
Related
I'm reading this article.
In the "Covariance matrix & SVD" section,
there are two \sigmas, which are \sigma_1 and \sigma_2.
Those values are 14.4 and 0.19, respectively.
How can I get these values?
I already calculated the covariance matrix with Numpy:
import numpy as np
a = np.array([[2.9, -1.5, 0.1, -1.0, 2.1, -4.0, -2.0, 2.2, 0.2, 2.0, 1.5, -2.5],
[4.0, -0.9, 0.0, -1.0, 3.0, -5.0, -3.5, 2.6, 1.0, 3.5, 1.0, -4.7]])
cov_mat = (a.shape[1] - 1) * np.cov(a)
print(cov_mat)
# b = np.std(a, axis=1)**0.5
b = (a.shape[1] - 1) * np.std(a, axis=1)**0.5
# b = np.std(cov_mat, axis=1)
# b = np.std(cov_mat, axis=1)**0.5
print(b)
The result is:
[[ 53.46 73.42]
[ 73.42 107.16]]
[15.98102431 19.0154037 ]
No matter what I do, I can't get 14.4 and 0.19.
Are they just wrong values?
Please help me. Thank you in advance.
Don't know why you "un-sampled' your covariance, but the original np.cov output is what you want to get eigenvalues of:
np.linalg.eigvalsh(np.cov(a))
Out[]: array([ 0.19403958, 14.4077786 ])
Given a list:
x = [0.0, 0.87, 0.0, 0.0, 0.0, 0.32, 0.46, 0.0, 0.0, 0.10, 0.0, 0.0]
I want to get the indexes of all the values that are not 0 and store them in d['inds']
Then using the indexes in d['inds'] go through the list of x and get the values.
So I would get something like:
d['inds'] = [1, 5, 6, 9]
d['vals'] = [0.87, 0.32, 0.46, 0.10]
I already got the indexes using:
d['inds'] = [i for i,m in enumerate(x) if m != 0]
but I'm not sure how to get d['vals']
d['vals'] = [x[i] for i in d['inds']]
Better yet, do both at once:
vals = []
inds = []
for i,v in enumerate(x):
if v!=0:
vals.append(v)
inds.append(i)
d['vals']=vals
d['inds']=inds
or
import numpy as np
d['inds'],d['vals'] = np.array([(i,v) for i,v in enumerate(x) if v!=0]).T
you can use numpy, its indexing features are designed for tasks like this one:
import numpy as np
x = np.array([0.0, 0.87, 0.0, 0.0, 0.0, 0.32, 0.46, 0.0, 0.0, 0.10, 0.0, 0.0])
x[x!=0]
Out: array([ 0.87, 0.32, 0.46, 0.1 ])
and if you're still interested in the indices:
np.argwhere(x!=0)
Out:
array([[1],
[5],
[6],
[9]], dtype=int64)
You can use a dict comprehension:
m = {i:j for i,j in enumerate(x) if j!=0}
list(m.keys())
Out[183]: [1, 5, 6, 9]
list(m.values())
Out[184]: [0.87, 0.32, 0.46, 0.1]
if you want to save this in a dictionary d then you can do:
d = {}
d['vals']=list(m.values())
d['ind']=list(m.keys())
d
{'vals': [0.87, 0.32, 0.46, 0.1], 'ind': [1, 5, 6, 9]}
Using Pandas:
x = [0.0, 0.87, 0.0, 0.0, 0.0, 0.32, 0.46, 0.0, 0.0, 0.10, 0.0, 0.0]
import pandas as pd
data = pd.DataFrame(x)
inds = data[data[0]!=0.0].index
print(inds)
Output: Int64Index([1, 5, 6, 9], dtype='int64')
Much easier:
df['vals']=list(filter(None,x))
df['idx']=df['vals'].apply(x.index)
Exaplantion:
Use filter(None,x) for filtering non-0 values, (None basically neans no statement (or not False)
Then use pandas apply for getting the index basically go trough the 'vals' column then then get the values index in the list x
I have a list as follows.
mylist= [0.0, 0.4, 0.81, 1.0, 0.9, 20.7, 0.0, 0.8, 1.0, 20.7]
I want to get the indexes of the top 4 elements of the list (i.e [5, 9, 3, 8]) and remove the indexes that have a value lesser than or equal to 1 (<=1).
Therefore my final output should be [5, 9]
My current code is as follows:
sorted_mylist = sorted(mylist, reverse = True)[:4]
for ele in sorted_mylist:
if ele>1:
print(mylist.index(ele))
However, it returns [5, 5], which is incorrect.
Please let me know how I can fix this in python?
You should use enumerate
mylist= [0.0, 0.4, 0.81, 1.0, 0.9, 20.7, 0.0, 0.8, 1.0, 20.7]
indices = [index for index, value in sorted(enumerate(mylist), reverse=True, key=lambda x: x[1]) if value > 1][:4]
# [5, 9]
You can sort the list along with the index, so that the index is easily retrieved later like:
Code:
sorted_mylist = sorted(((v, i) for i, v in enumerate(mylist)), reverse=True)
Test Code:
mylist = [0.0, 0.4, 0.81, 1.0, 0.9, 20.7, 0.0, 0.8, 1.0, 20.7]
sorted_mylist = sorted(((v, i) for i, v in enumerate(mylist)), reverse=True)
result = []
for i, (value, index) in enumerate(sorted_mylist):
if i == 4:
break
if value > 1:
result.append(index)
print(result)
Results:
[9, 5]
All above answers are good, but if you do not persist to use your current code, and just want to solve your problem itself, here is another option with pandas, just FYI:
import pandas as pd
mylist= [0.0, 0.4, 0.81, 1.0, 0.9, 20.7, 0.0, 0.8, 1.0, 20.7]
s = pd.Series(mylist).sort_values(ascending=False)[:4]
s = s[s > 1]
print s.index.tolist()
I have a symmetric, multi-index dataframe from which I want to systematically extract data:
import pandas as pd
df_index = pd.MultiIndex.from_arrays(
[["A", "A", "B", "B"], [1, 2, 3, 4]], names = ["group", "id"])
df = pd.DataFrame(
[[1.0, 0.5, 0.3, -0.4],
[0.5, 1.0, 0.9, -0.8],
[0.3, 0.9, 1.0, 0.1],
[-0.4, -0.8, 0.1, 1.0]],
index=df_index, columns=df_index)
I want a function extract_vals that can return all values related to elements in the same group, EXCEPT for the diagonal AND elements must not be double-counted. Here are two examples of the desired behavior (order does not matter):
A_vals = extract_vals("A", df) # [0.5, 0.3, -0.4, 0.9, -0.8]
B_vals = extract_vals("B", df) # [0.3, 0.9, 0.1, -0.4, -0.8]
My question is similar to this question on SO, but my situation is different because I am using a multi-index dataframe.
Finally, to make things more fun, please consider efficiency because I'll be running this many times on much bigger dataframes. Thanks very much!
EDIT:
Happy001's solution is awesome. I came up with a method myself based on the logic of extracting the elements where target is NOT in BOTH the rows and columns, and then extracting the lower triangle of those elements where target IS in BOTH the rows and columns. However, Happy001's solution is much faster.
First, I created a more complex dataframe to make sure both methods are generalizable:
import pandas as pd
import numpy as np
df_index = pd.MultiIndex.from_arrays(
[["A", "B", "A", "B", "C", "C"], [1, 2, 3, 4, 5, 6]], names=["group", "id"])
df = pd.DataFrame(
[[1.0, 0.5, 1.0, -0.4, 1.1, -0.6],
[0.5, 1.0, 1.2, -0.8, -0.9, 0.4],
[1.0, 1.2, 1.0, 0.1, 0.3, 1.3],
[-0.4, -0.8, 0.1, 1.0, 0.5, -0.2],
[1.1, -0.9, 0.3, 0.5, 1.0, 0.7],
[-0.6, 0.4, 1.3, -0.2, 0.7, 1.0]],
index=df_index, columns=df_index)
Next, I defined both versions of extract_vals (the first is my own):
def extract_vals(target, multi_index_level_name, df):
# Extract entries where target is in the rows but NOT also in the columns
target_in_rows_but_not_in_cols_vals = df.loc[
df.index.get_level_values(multi_index_level_name) == target,
df.columns.get_level_values(multi_index_level_name) != target]
# Extract entries where target is in the rows AND in the columns
target_in_rows_and_cols_df = df.loc[
df.index.get_level_values(multi_index_level_name) == target,
df.columns.get_level_values(multi_index_level_name) == target]
mask = np.triu(np.ones(target_in_rows_and_cols_df.shape), k = 1).astype(np.bool)
vals_with_nans = target_in_rows_and_cols_df.where(mask).values.flatten()
target_in_rows_and_cols_vals = vals_with_nans[~np.isnan(vals_with_nans)]
# Append both arrays of extracted values
vals = np.append(target_in_rows_but_not_in_cols_vals, target_in_rows_and_cols_vals)
return vals
def extract_vals2(target, multi_index_level_name, df):
# Get indices for what you want to extract and then extract all at once
coord = [[i, j] for i in range(len(df)) for j in range(len(df)) if i < j and (
df.index.get_level_values(multi_index_level_name)[i] == target or (
df.columns.get_level_values(multi_index_level_name)[j] == target))]
return df.values[tuple(np.transpose(coord))]
I checked that both functions returned output as desired:
# Expected values
e_A_vals = np.sort([0.5, 1.0, -0.4, 1.1, -0.6, 1.2, 0.1, 0.3, 1.3])
e_B_vals = np.sort([0.5, 1.2, -0.8, -0.9, 0.4, -0.4, 0.1, 0.5, -0.2])
e_C_vals = np.sort([1.1, -0.9, 0.3, 0.5, 0.7, -0.6, 0.4, 1.3, -0.2])
# Sort because order doesn't matter
assert np.allclose(np.sort(extract_vals("A", "group", df)), e_A_vals)
assert np.allclose(np.sort(extract_vals("B", "group", df)), e_B_vals)
assert np.allclose(np.sort(extract_vals("C", "group", df)), e_C_vals)
assert np.allclose(np.sort(extract_vals2("A", "group", df)), e_A_vals)
assert np.allclose(np.sort(extract_vals2("B", "group", df)), e_B_vals)
assert np.allclose(np.sort(extract_vals2("C", "group", df)), e_C_vals)
And finally, I checked speed:
## Test speed
import time
# Method 1
start1 = time.time()
for ii in range(10000):
out = extract_vals("C", "group", df)
elapsed1 = time.time() - start1
print elapsed1 # 28.5 sec
# Method 2
start2 = time.time()
for ii in range(10000):
out2 = extract_vals2("C", "group", df)
elapsed2 = time.time() - start2
print elapsed2 # 10.9 sec
I don't assume df has the same columns and index. (Of course they can be the same).
def extract_vals(group_label, df):
coord = [[i, j] for i in range(len(df)) for j in range(len(df)) if i<j and (df.index.get_level_values('group')[i] == group_label or df.columns.get_level_values('group')[j] == group_label) ]
return df.values[tuple(np.transpose(coord))]
print extract_vals('A', df)
print extract_vals('B', df)
result:
[ 0.5 0.3 -0.4 0.9 -0.8]
[ 0.3 -0.4 0.9 -0.8 0.1]
is that what you want?
all elements above the diagonal:
In [139]: df.values[np.triu_indices(len(df), 1)]
Out[139]: array([ 0.5, 0.3, -0.4, 0.9, -0.8, 0.1])
A_vals:
In [140]: df.values[np.triu_indices(len(df), 1)][:-1]
Out[140]: array([ 0.5, 0.3, -0.4, 0.9, -0.8])
B_vals:
In [141]: df.values[np.triu_indices(len(df), 1)][1:]
Out[141]: array([ 0.3, -0.4, 0.9, -0.8, 0.1])
Source matrix:
In [142]: df.values
Out[142]:
array([[ 1. , 0.5, 0.3, -0.4],
[ 0.5, 1. , 0.9, -0.8],
[ 0.3, 0.9, 1. , 0.1],
[-0.4, -0.8, 0.1, 1. ]])
I have a pre-defined list that gives data in the form of (min, max, increment). for example:
[[0.0 1.0 0.1 #mass
1.0 5.0 1.0 #velocity
45.0 47.0 1.0 #angle in degrees
0.05 0.07 0.1 #drag coeff.
0.0 0.0 0.0 #x-position
0.0 0.0 0.0]] #y-postion
and this goes on a for a few more variables. Ideally I want to take each one in as an individual variable declaration and create a finite list of each value in the given range.
For example, mass would be:
m = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
this way I can utilize itertools.combinations((m, x, b,...), r) to create all possible combinations given the various possibilities of each variable.
Any suggestions?
Not sure about you list structure, if you do need to take slices you can use itertools.islice and store all lists in a dict:
from itertools import islice
l = iter([0.0, 1.0, 0.1, #mass
1.0, 5.0, 1.0,#velocity
45.0 ,47.0, 1.0, #angle in degrees
0.05, 0.07, 0.1, #drag coeff.
0.0, 0.0 ,0.0 ,#x-position
0.0 ,0.0, 0.0])#y-postion
d = {}
import numpy as np
for v in ("m","v","and","drg","x-p","y-p"): # put all "variable" names in order
start, stop , step = islice(l, None, 3)
# or use next()
# start, stop , step = next(l), next(l), next(l)
if stop > start: # make sure we have a step to take
# create key/value pairing
d[v] = np.arange(start, stop + 1,step)
else:
# add empty list for zero values
d[v] = []
print(d)
{'x-p': [], 'drg': array([ 0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85,
0.95, 1.05]), 'and': array([ 45., 46., 47.]), 'v': array([ 1., 2., 3., 4., 5.]), 'y-p': [], 'm': array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ,
1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9])}
You can also create your own range that will take a float as a step:
def float_range(start=0, stop=None, step=1):
while start <= stop:
yield start
start += step
Then call it with list(start, stop,step), but you need to be careful when dealing with floats because of Floating Point Arithmetic: Issues and Limitations
You wrote the list as a flat list, with all numbers on the same level
[[0.0 1.0 0.1 1.0 5.0 1.0 45.0 47.0 1.0 ...]]
but it's possible you meant to write it as a nested list
[[0.0, 1.0, 0.1], [1.0, 5.0, 1.0], [45.0, 47.0, 1.0], ...]
so I'll show both solutions. Please let me know how your data/list is actually structured.
Python's range function doesn't support floats, but you can use NumPy's arange.
The try ... except part is for your unchanging values like 0.0 0.0 0.0 #x-position.
Flat list solution:
flat_list = [0.0, 1.0, 0.1,
1.0, 5.0, 1.0,
45.0, 47.0, 1.0,
0.05, 0.07, 0.1,
0.0, 0.0, 0.0,
0.0, 0.0, 0.0]
import numpy as np
incremented_lists = []
for i in range(0, len(flat_list), 3): # Step in threes
minimum, maximum, increment = flat_list[i:i+3]
try:
incremented_list = list(np.arange(minimum, maximum + increment, increment))
except ZeroDivisionError:
incremented_list = [minimum]
incremented_lists.append(incremented_list)
Nested list solution:
nested_list = [[0.0, 1.0, 0.1],
[1.0, 5.0, 1.0],
[45.0, 47.0, 1.0],
[0.05, 0.07, 0.1],
[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0]]
import numpy as np
incremented_lists = []
for sub_list in nested_list:
minimum, maximum, increment = sub_list
try:
incremented_list = list(np.arange(minimum, maximum + increment, increment))
except ZeroDivisionError:
incremented_list = [minimum]
incremented_lists.append(incremented_list)
Running either of these with Python 2.7 or Python 3.3 gets this:
incremented_lists: [[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
[1.0, 2.0, 3.0, 4.0, 5.0],
[45.0, 46.0, 47.0],
[0.05, 0.15],
[0.0],
[0.0]]
The [0.05, 0.15] is probably undesirable, but I think your huge 0.1 increment for the drag coefficient is more likely a typo than something I should make the code handle. Please let me know if you would like the code to handle unnatural increments and avoid overshooting the maximum. One way to handle that would be to add incremented_list = [x for x in incremented_list if x <= maximum] right before incremented_lists.append(incremented_list), though I'm sure there's a cleaner way to do it.
I can't think of any existing format supporting your desired input -- with spaces as separator, newlines breaking sub-lists, and comments actually meaningful as you appear to desire the to define the sub-lists' names. So, I think you'll have to code your own parser, e.g:
import re, numpy as np
res_dict = {}
with open('thefile.txt') as f:
for line in f:
mo = re.match(r'[?[(\S+)\s*(\S+)\s*(\S+)\s*#(\w)', line)
keybase = mo.group(4)
keyadd = 0
key = keybase
while key in res_dict:
key = '{}{}'.format(keybase, keyadd)
keyadd += 1
res_dict[key] = np.arange(
float(mo.group(1)),
float(mo.group(2)),
float(mo.group(3)),
)
This won't give you a top-level variable m as you mention -- but rather a better-structured, more robust res_dict['m'] instead. If you insist on making your code brittle and fragile, you can globals().update(res_dict) to make it so:-)...