Partial Indexing Error in Python Series [duplicate] - python

This question already has answers here:
key error and MultiIndex lexsort depth
(1 answer)
What exactly is the lexsort_depth of a multi-index Dataframe?
(1 answer)
Closed 5 years ago.
I have created a Hierarchical indexed Series and I wanted to partially index some values of the Series. But When I changed the alphabetic order of the Series. The partially indexing is not working. Can anybody explain why is this happening?
with Some better and logical explanation.
sr = Series(np.arange(11),index=[['a','b','b','c','d','d','e','e','f','f','f'],[1,2,1,3,1,2,1,2,1,2,3]])
print (sr['a':'c'])
This gives the resultant output but when I change the alphabetic order of the indexes, the partial indexing gives an error.
hs = Series(np.arange(10),index=[['a','a','b','b','c','c','d','e','e','a'],[1,0,2,1,0,1,1,3,2,3]])
print(hs['a':'c'])
pandas.errors.UnsortedIndexError: 'Key length (1) was greater than MultiIndex lexsort depth (0)'

Related

Conditional check in pandas dataframe loc [duplicate]

This question already has answers here:
Logical operators for Boolean indexing in Pandas
(4 answers)
Pandas column access w/column names containing spaces
(6 answers)
Closed 3 years ago.
I'm referring to this document https://datatofish.com/if-condition-in-pandas-dataframe/
The part - (3) IF condition - strings
I'm trying to implement it with 2 conditions as:
x.loc[x.Test Status == 'Finished' and x.Results Validation == 'In Limits', 'Outcome'] = 'PASS'
I've a invalid syntax error. How do I handle this? I've tried multiple workarounds like np.where but no luck.

How to get the second largest value in Pandas Python [duplicate]

This question already has answers here:
Get first and second highest values in pandas columns
(7 answers)
Closed 4 years ago.
This is my code:
maxData = all_data.groupby(['Id'])[features].agg('max')
all_data = pd.merge(all_data, maxData.reset_index(), suffixes=["", "_max"], how='left', on=['Id'])
Now Instead of getting the max value, How can I fetch the second max value in the above code (groupBy Id)
Try using nlargest
maxData = all_data.groupby(['Id'])[features].apply(lambda x:x.nlargest(2)[1]).reset_index(drop=True)
You can use the nth method just after sorting the values;
maxData = all_data.sort_values("features", ascending=False).groupby(['Id']).nth(1)
Please ignore apply method as it decreases performance of code.

Python: len() of unknown numpy array column length [duplicate]

This question already has answers here:
Counting the number of non-NaN elements in a numpy ndarray in Python
(5 answers)
Closed 4 years ago.
I'm currently trying to learn Python and Numpy. The task is to determine the length of individual columns of an imported CSV file.
So far I have:
import numpy as np
data = np.loadtxt("assignment5_data.csv", delimiter = ',')
print (data.shape[:])
Which returns:
(62, 2)
Is there a way to iterate through each column to count [not is.nan]?
If I understand correctly, and you are trying to get the length of non-nan values in each column, use:
np.sum(~np.isnan(data),axis=0)

Pandas error - Truth Value of a Series is Ambiguous when using iloc [duplicate]

This question already has answers here:
pandas comparison raises TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]
(2 answers)
Selecting with complex criteria from pandas.DataFrame
(5 answers)
Closed 5 years ago.
I am using a pandas dataframe and I am trying to select rows where the yearID == 2001 and the team_IDx == 'OAK'. The yearID column is of type int and team_IDx is an object. Here's the expression I'm using:
mergeddf.loc[(mergeddf['yearID'] == 2001 & mergeddf['teamID_x'] == 'OAK')]
But I keep getting the error:
TypeError: cannot compare a dtyped [object] array with a scalar of type [bool]
I'm a beginner and not even sure how to phrase my question. I've looked at other answers on stack overflow, but they don't make sense to me. What does this error mean? What underlying concepts should I know about to be able to understand it on my own? How do I resolve this?
This is due to the operator precedence of the bitwise operators, which have higher precedence than logical operators. You need another layer of parentheses around each condition:
mergeddf.loc[((mergeddf['yearID'] == 2001) & (mergeddf['teamID_x'] == 'OAK'))]

converting pandas data frame to dictionary of tuples [duplicate]

This question already has answers here:
pandas DataFrame to dict with values as tuples
(2 answers)
Closed 5 years ago.
I am using the following code to convert a Dataframe whose structure is as follows
dummy= df.set_index(['location']).T.to_dict('list')
for key,value in dummy.items():
dummy[key] = tuple(value)
to obtain a dictionary of tuples
{loc_1:(35.99,-81.44),loc_2:(22.55,-108.5)}
Question
1. Will the order be preserved as lat-long? (Is there a chance the first tuple can turn out to be (-81.44,35.99)?
Question 2. Is there a better(faster/elegant)way of doing the above
Using a comprehension and itertuples
dict([(t.location, (t.lat, t.long)) for t in df.itertuples()])
{loc_1: (35.99, -81.44), loc_2: (22.55, -108.5)}

Categories

Resources