Key Errors when accessing some indices of pandas series [duplicate] - python

This question already has answers here:
How are iloc and loc different?
(6 answers)
Closed 2 years ago.
I am working with the OULAD dataset in pandas, and i'm trying to view the labels of some specific rows. For some reason, some indices produce key errors and some do not.
code:
labels = info["final_result"].copy()
print(type(labels))
print(labels)
gives the result:
<class 'pandas.core.series.Series'>
21847 Fail
19351 Fail
10841 Withdrawn
4360 Withdrawn
8991 Withdrawn
...
29976 Distinction
629 Withdrawn
7329 Pass
25941 Pass
21098 Pass
Name: final_result, Length: 26074, dtype: object
and, for example
print(labels[10])
prints out:
pass
which is the correct label.
However,
print(labels[9])
for whetever reason, results in:
KeyError: 9
any ideas?

probably this index is not existing in the dataframe.
Look at this exmample:
tmp = pd.DataFrame({"a":[0,1,2,3]})
tmp = tmp.drop(1)
tmp["a"][1]
KeyError: 1

Related

Reorder dataframe groupby medians following custom order [duplicate]

This question already has answers here:
How to sort pandas dataframe by custom order on string index
(5 answers)
Closed 18 days ago.
I have a dataset containing a bunch of data in the columns params and value. I'd like to count how many values each params contains (to use as labels in a boxplot), so I use mydf['params'].value_counts() to show this:
slidingwindow_250 11574
hotspots_1k_100 8454
slidingwindow_500 5793
slidingwindow_100 5366
hotspots_5k_500 3118
slidingwindow_1000 2898
hotspots_10k_1k 1772
slidingwindow_2500 1160
slidingwindow_5000 580
Name: params, dtype: int64
I have a list of all of the entries in params in the order I wish to display them in a boxplot. I try to use sort_index(level=myorder) to get them in my custom order, but the function ignores myorder and just sorts them alphabetically.
myorder = ["slidingwindow_100",
"slidingwindow_250",
"slidingwindow_500",
"slidingwindow_1000",
"slidingwindow_2500",
"slidingwindow_5000",
"hotspots_1k_100",
"hotspots_5k_500",
"hotspots_10k_1k"]
sizes_bp_log_df['params'].value_counts().sort_index(level=myorder)
hotspots_10k_1k 1772
hotspots_1k_100 8454
hotspots_5k_500 3118
slidingwindow_100 5366
slidingwindow_1000 2898
slidingwindow_250 11574
slidingwindow_2500 1160
slidingwindow_500 5793
slidingwindow_5000 580
Name: params, dtype: int64
How can I get the index of my value counts in the order I want them to be in?
In addition, I'll be using the median of each distribution as coordinates for the boxplot labels too, which I retrieve using sizes_bp_log_df.groupby(['params']).median(); hopefully your suggested sort methods will also work for that task.
Use reindex instead of sort_index

Is loc an optional attribute when searching dataframe? [duplicate]

This question already has answers here:
What is the difference between using loc and using just square brackets to filter for columns in Pandas/Python?
(4 answers)
Closed 1 year ago.
Both the following lines seem to give the same output:
df1 = df[df['MRP'] > 1500]
df1 = df.loc[df['MRP'] > 1500]
Is loc an optional attribute when searching dataframe?
Coming from Padas.DataFrame.loc documentation:
Access a group of rows and columns by label(s) or a boolean array.
.loc[] is primarily label based, but may also be used with a boolean
array.
When you are using Boolean array to filter out data, .loc is optional, and in your example df['MRP'] > 1500 gives a Series with the values of truthfulness, so it's not necessary to use .loc in that case.
df[df['MRP']>15]
MRP cat
0 18 A
3 19 D
6 18 C
But if you want to access some other columns where this Boolean Series has True value, then you may use .loc:
df.loc[df['MRP']>15, 'cat']
0 A
3 D
6 C
Or, if you want to change the values where the condition is True:
df.loc[df['MRP']>15, 'cat'] = 'found'

Python math operation on column [duplicate]

This question already has answers here:
Convert pandas.Series from dtype object to float, and errors to nans
(3 answers)
Closed 3 years ago.
Data from json is in df and am trying to ouput to a csv.
I am trying to multiply dataframe column with a fixed value and having issues how data is displayed
I have used the following but the data is still not how i want to display
df_entry['Hours'] = df_entry['Hours'].multiply(2)
df_entry['Hours'] = df_entry['Hours'] * 2
Input
ID, name,hrs
100,AB,37.5
Expected
ID, name,hrs
100,AB,75.0
What I am getting
ID, name,hrs
100,AB,37.537.5
That happens because the dtype of the column is str. You need to convert it to float before multiplication.
df_entry['Hours'] = df_entry['Hours'].astype(float) * 2
You can use apply function.
df_entry['Hours'] = df_entry['Hours'].apply(lambda x: float(int(x))*2)

Pandas: Add a scalar to multiple new columns in an existing dataframe [duplicate]

This question already has answers here:
How to add multiple columns to pandas dataframe in one assignment?
(13 answers)
Closed 4 years ago.
I recently answered a question where the OP was looking multiple columns with multiple different values to an existing dataframe (link). And it's fairly succinct, but I don't think very fast.
Ultimately I was hoping I could do something like:
# Existing dataframe
df = pd.DataFrame({'a':[1,2]})
df[['b','c']] = 0
Which would result in:
a b c
1 0 0
2 0 0
But it throws an error.
Is there a super simple way to do this that I'm missing? Or is the answer I posted earlier the fastest / easiest way?
NOTE
I understand this could be done via loops, or via assigning scalars to multiple columns, but am trying to avoid that if possible. Assume 50 columns or whatever number you wouldn't want to write:
df['b'], df['c'], ..., df['xyz'] = 0, 0, ..., 0
Not a duplicate:
The "Possible duplicate" question suggested to this shows multiple different values assigned to each column. I'm simply asking if there is a very easy way to assign a single scalar value to multiple new columns. The answer could correctly and very simply be, "No" - but worth knowing so I can stop searching.
Why not using assign
df.assign(**dict.fromkeys(['b','c'],0))
Out[781]:
a b c
0 1 0 0
1 2 0 0
Or create the dict by d=dict(zip([namelist],[valuelist]))
I think you want to do
df['b'], df['c'] = 0, 0

Partial Indexing Error in Python Series [duplicate]

This question already has answers here:
key error and MultiIndex lexsort depth
(1 answer)
What exactly is the lexsort_depth of a multi-index Dataframe?
(1 answer)
Closed 5 years ago.
I have created a Hierarchical indexed Series and I wanted to partially index some values of the Series. But When I changed the alphabetic order of the Series. The partially indexing is not working. Can anybody explain why is this happening?
with Some better and logical explanation.
sr = Series(np.arange(11),index=[['a','b','b','c','d','d','e','e','f','f','f'],[1,2,1,3,1,2,1,2,1,2,3]])
print (sr['a':'c'])
This gives the resultant output but when I change the alphabetic order of the indexes, the partial indexing gives an error.
hs = Series(np.arange(10),index=[['a','a','b','b','c','c','d','e','e','a'],[1,0,2,1,0,1,1,3,2,3]])
print(hs['a':'c'])
pandas.errors.UnsortedIndexError: 'Key length (1) was greater than MultiIndex lexsort depth (0)'

Categories

Resources