So in my code, I have a list (game.patrolled) being accessed, and when I print the list, I get:
[40, 44, 46, 39]
This is what it should be, this part is correct.
When I print the variables (x) in my for loop, I get:
40
48
44
49
37
39
42
46
47
This is also what I should get, this part is also correct.
But when I have it print (x in game.patrolled) every time it loops for x, I get:
False
False
False
False
False
False
False
False
False
This is wrong. I should have 4 of these values be True, since they are in the list, as you can clearly see above. Is there any reason as to why the "in" operator just doesn't seem to be working?
Related
I have a pandas.core.series.Series object that looks like this:
6 7
8 9
18 19
35 36
42 43
I want to get a list of 3 randomly chosen numbers from this list. I tried following the advice here and tried
sampled_list = random.sample(df['ID'], 3)
with no luck. Any suggestions?
I think you're looking for:
df['ID'].sample(3).tolist()
Docs: Series.sample()
sampled_list = random.sample(list(df['ID']), 3)
I have a pandas dataframe that looks like so:
datetime Online TEST
61 2018-03-03 True False
62 2018-03-04 True False
63 2018-03-05 True False
64 2018-03-06 True False
65 2018-03-07 True False
66 2018-03-08 True False
67 2018-03-09 True False
68 2018-03-10 True False
69 2018-03-11 False False
70 2018-03-12 False False
I need to check that for each False in the TEST column, that within a date range of 7 days that there is a False in the Online column. For example, on 2018-03-03, since TEST is False, I would want to check all plus or minus 7 days (ie plus or minus timedelta(days = 7)) for False values in the Online column. So since there are no False Online values within a 7 day time frame, then we would return a False. On the other hand, consider the date 2018-03-09, where Online is True and TEST is False. Since there is a False in Online on the day 2018-03-11, I need to return a boolean True value saying that there was a False within my 7 day time range.
I can achieve this using some slow and ugly looping mechanisms (ie go through each row using DataFrame.iterrows(), check if TEST is false, then pull the time window of plus or minus 7 days to see if Online also has a corresponding False value. But I would ideally like to have something snazzier and faster. For a visual, this is what I need my final dataframe to look like:
datetime Online TEST Check
61 2018-03-03 True False False
62 2018-03-04 True False True
63 2018-03-05 True False True
64 2018-03-06 True False True
65 2018-03-07 True False True
66 2018-03-08 True False True
67 2018-03-09 True False True
68 2018-03-10 True False True
69 2018-03-11 False False True
70 2018-03-12 False False True
Any ideas out there? Thanks in advance!
Building upon great #piRSquared comments (I didn't even know about the rolling method, it seems very useful!), you can use
check = ~(df.TEST + df.Online.rolling(15, center=True, min_periods=1).apply(np.prod).eq(1))
The second summand creates a Series object in which every element is a boolean indicating if there is not any False value in a window of size 15; this is achieved by multiplying (NumPy's prod function) all the values inside this rolling window.
The sum (with the logical inverse operator ~) is what compares booleans so we only get True in the Check series if there are two False in both columns (element-wise, of course).
Hope it helps.
cleanedList = [x for x in range(0, 100, 1)]
idx = 0
for val in cleanedList:
check = abs(cleanedList[idx])
idx = idx + 1
if check % 5 == 0: ##### Conditions changed and change the list
cleanedList = a new list that loops over.
This is arbitrary example. I want to change the list it is looping now when the conditions fails. I tried this way. I don't think it actually changed the list it is looping now. Please correct me.
It is not advisable to change the list over which you are looping. However, if this is what you really want, then you could do it this way:
cleanedList = list(range(0, 100, 1))
for i, _ in enumerate(cleanedList):
check = abs(cleanedList[i])
if check % 5 == 0: ##### Change the list it is looping now
cleanedList[:] = range(60, 100, 2)
This is an interesting one, because you haven't actually mutated the list.
cleanedList = [x for x in range(0, 100, 1)] # Creates list1
idx = 0
for val in cleanedList: # begin iterating list1. It's stored internally here.
check = abs(cleanedList[idx])
print val, check,
idx = idx + 1
if check < 30: ##### Change the list it is looping now
cleanedList = [x for x in range(60,100,2)] # reassign here, but it becomes list2.
The output tells the story:
0 0 1 62 2 64 3 66 4 68 5 70 6 72 7 74 8 76 9 78 10 80 11 82 12 84 13 86 14 88 15 90 16 92 17 94 18 96 19 98
Because you didn't mutate, you reassigned, the dangling reference to the list you're iterating over initially still exists for the context of the for loop, and it continues way past the end of list 2, which is why you eventually throw IndexError - there are 100 items in your first list, and only 20 in your second list.
Very briefly, when you want to edit a list you're iterating over, you should use a copy of the list. so your code simply transfers to:
for val in cleanedList[:]:
and you can have all kinds of edits on your original cleanedList and no error will show up.
Task: Search a multi column dataframe for a value (all values are unique) and return the index of that row.
Currently: using get_loc, but it only seems allow a pass of a single column at a time, resulting in quite a ineffective set of try except statements. Although it works is anyone aware of a more effective way to do this?
df = pd.DataFrame(np.random.randint(0,100,size=(4, 4)), columns=list('ABCD'))
try:
unique_index = pd.Index(df['A'])
print(unique_index.get_loc(20))
except KeyError:
try:
unique_index = pd.Index(df['B'])
print(unique_index.get_loc(20))
except KeyError:
unique_index = pd.Index(df['C'])
print(unique_index.get_loc(20))
Loops don't seem to work because of the KeyError that is raised if a column doesn't contain a value. I've looked at functions such as .contains or .isin but it's the location index that i'm interested in.
You could use np.where, which returns a tuple of row and column indices where your value is present. You can then select just the row from this.
df = pd.DataFrame(np.random.randint(0,100,size=(4, 4)), columns=list('ABCD'))
indices = np.where(df.values == 20)
rows = indices[0]
if len(rows) != 0:
print(rows[0])
Consider this example instead using np.random.seed
np.random.seed([3, 1415])
df = pd.DataFrame(
np.random.randint(200 ,size=(4, 4)),
columns=list('ABCD'))
df
A B C D
0 11 98 123 90
1 143 126 55 141
2 139 141 154 115
3 63 104 128 120
We can find where values are what you're looking for using np.where and slicing. Notice that I used a value of 55 because that what I had in the data I got from the seed I chose. This will work just fine for 20 if it is in your data set. In fact, it'll work if you have more than one.
i, j = np.where(df.values == 55)
list(zip(df.index[i], df.columns[j]))
[(1, 'C')]
Use vectorized operations and boolean indexing:
df[(df==20).any(axis=1)].index
Another way
df[df.eq(20)].stack()
Out[1220]:
1 C 20.0
dtype: float64
Since other posters used np.where() I'll give another option using any().
df.loc[df.isin([20]).any(axis=1)].index
Since df.loc[*condition_here*] will return TRUE if the condition is met, you can use any to filter to the row where it may be true
so here is my example of my df:
A B C D
0 82 7 48 90
1 68 18 90 14 #< ---- notice the 18 here
2 18 34 72 24 #< ---- notice the 18 here
3 69 73 40 86
df.isin([18])
A B C D
0 False False False False
1 False True False False #<- ---- notice the TRUE value
2 True False False False #<- ---- notice the TRUE value
3 False False False False
print(df.loc[df.isin([18]).any(axis=1)].index.tolist())
#output is a list
[1, 2]
I am attempting to search for matching values within a range within a given uncertainty in a pandas dataframe. For instance, if I have a dataframe:
A B C
0 12 12.6 111.20
1 14 23.4 112.20
2 16 45.6 112.30
3 18 56.6 112.40
4 27 34.5 121.60
5 29 65.2 223.23
6 34 45.5 654.50
7 44 65.6 343.50
How can I search for a value that matches 112.6 +/-0.4 without having to create a long and difficult criteria like:
TargetVal_Max= 112.6+0.4
TargetVal_Min= 112.6-0.4
Basically, I want to create a "buffer window" that allows for all values matching a window to be returned back. I have uncertainties package, but have yet to get it working like this.
Optimally, I'd like to be able to return all index values that match a value in both C and B within a given error range.
Edit
As pointed out by #MaxU, the np.isclose f(x) works very well if you know the exact number. But is it possible to match a list of values, such that if I had a second dataframe and wanted to see if the values in C from one matched the values of C (second dataframe) within a tolerance? I have attempted to get them into a list and do it this way, but I am getting problems when attempting to do it for more than a single value at a time.
TEST= Dataframe_2["C"]
HopesNdreams = sample[sample["C"].apply(np.isclose,b=TEST, atol=1.0)]
Edit 2
I found through trying a couple of different work arounds that I can just do:
TEST1= Dataframe_2["C"].tolist
for i in TEST1:
HopesNdreams= sample[sample["C"].apply(np.isclose,b=i, atol=1.0)]
And this returns the hits for the given column. Using the logic set forth in the first answer, I think this will work very well for what I need it to. Are there any hangups that I don't see with this method?
Cheers and thanks for the help!
IIUC you can use np.isclose() function:
In [180]: df[['B','C']].apply(np.isclose, b=112.6, atol=0.4)
Out[180]:
B C
0 False False
1 False True
2 False True
3 False True
4 False False
5 False False
6 False False
7 False False
In [181]: df[['B','C']].apply(np.isclose, b=112.6, atol=0.4).any(1)
Out[181]:
0 False
1 True
2 True
3 True
4 False
5 False
6 False
7 False
dtype: bool
In [182]: df[df[['B','C']].apply(np.isclose, b=112.6, atol=0.4).any(1)]
Out[182]:
A B C
1 14 23.4 112.2
2 16 45.6 112.3
3 18 56.6 112.4
Use Series.between():
df['C'].between(112.6 + .4, 112.6 - .4)