Pandas multiindex: get level values without duplicates

Pandas multiindex: get level values without duplicates - python

So I'm sure this is pretty trivial but I'm pretty new to python/pandas.
I want to get a certain column (Names of my measurements) of my Multiindex as a list to use it in a for loop later to name and save my plots. I'm pretty confident in getting the data I need from my dataframe but i can't figure out how to get certain columns from my index.
So actually while writing the question I kind of figured the answer out but it still seems kind of clunky. There has to be a direct command to do this.
That would be my code:
a = df.index.get_level_values('File')
a = a.drop_duplicates()
a = a.values

index.levels
You can access unique elements of each level of your MultiIndex directly:
df = pd.DataFrame([['A', 'W', 1], ['B', 'X', 2], ['C', 'Y', 3],
['D', 'X', 4], ['E', 'Y', 5]])
df = df.set_index([0, 1])
a = df.index.levels[1]
print(a)
Index(['W', 'X', 'Y'], dtype='object', name=1)
To understand the information available, see how the Index object is stored internally:
print(df.index)
MultiIndex(levels=[['A', 'B', 'C', 'D', 'E'], ['W', 'X', 'Y']],
labels=[[0, 1, 2, 3, 4], [0, 1, 2, 1, 2]],
names=[0, 1])
However, the below methods are more intuitive and better documented.
One point worth noting is you don't have to explicitly extract the NumPy array via the values attribute. You can iterate Index objects directly. In addition, method chaining is possible and encouraged with Pandas.
drop_duplicates / unique
Returns an Index object, with order preserved.
a = df.index.get_level_values(1).drop_duplicates()
# equivalently, df.index.get_level_values(1).unique()
print(a)
Index(['W', 'X', 'Y'], dtype='object', name=1)
set
Returns a set. Useful for O(1) lookup, but result is unordered.
a = set(df.index.get_level_values(1))
print(a)
{'X', 'Y', 'W'}

Related

how to automatically drop index levels that only have single value?

I have a dataframe that has A to M columns for example. Then I did:
groups = df.groupby(['E', 'D', 'B', 'G', 'I'])
stats = pd.concat(
[
groups['N'].mean().rename("N_mean"),
groups['H'].median().rename('H_median')
]
)
stats = stats[stats['N']>0]
now if I print stats, the index is ('E', 'D', 'B', 'G', 'I'). However, many of them only contain single value which means they are insignificant. I knew I can determine which level is insignificant then stats.index.droplevel(...). But I wonder is there already builtin method to automatically do this?

Manipulating a list of lists multiple times in order

I'm trying to alter a list of lists in multiple ways by using a function (as I will have more than one list of lists).
I know how to change something once, but how do I do more than that? I get the error:
AttributeError: 'int' object has no attribute 'insert'
I understand that the error essentially means (whatever I'm trying to use .insert() on is not a list) but I don't quite understand why it's not a list...
See my code below:
This works and gives me the desired output
list_of_list3 = [['a', 1], ['b', 2], ['c', 3]]
list_to_add = ['Z', 'X', 'Y']
for list_position in range(len(list_of_list3)):
original_list = list_of_list3[list_position]
element_to_add = list_to_add[list_position]
original_list.insert(0, element_to_add)
print(list_of_list3)
This will give me what I want:
[['Z', 'a', 1], ['X', 'b', 2], ['Y', 'c', 3]]
However, what I need is a function which does more than one thing at once. I am trying the code below:
def output_function(add_list, list_of_list):
for list_position in range(len(list_of_list)):
list_within_list = cleaned_list[list_position]
add_element1 = add_list[list_position] # The two lists will always have the same length
list_within_list = list_within_list.pop() # I want to remove the last element
list_with_element1 = list_within_list.insert(0, add_element1) # I then want to add a new element
list_with_new_list = list_with_element1.insert(0, ['Column1', 'Column2', 'Column3']) #Then I want to add a new list to the beginning of list of lists
new_elements = ['A', 'B', 'C']
original_list_list = [['D', 1, 2], ['E', 3, 4], ['F', 5, 6']
output_function(new_elements, original_list_list)
My desired output is (ultimately will turn this into a pandas df)
[['Column1', 'Column2', 'Column3'], ['A', 'D', 1], ['B','E', 3], ['C', 'F', 5]]
Any help is appreciated. Thanks!

I believe you are having some misunderstanding with the methods you are calling.
Your comments indicate you are going to throw away the popped element, but you are actually throwing away the list, and using the element instead.
These 2 lines:
list_within_list = list_within_list.pop() # I want to remove the last element
list_with_element1 = list_within_list.insert(0, add_element1) # I then want to add a new element
One way to accomplish:
list_with_element1 = list_within_list[:-1]
list_with_element1.insert(0, add_element1)

Indexing failure/odd behaviour with array

I have some code that is intended to convert a 3-dimensional list to an array. Technically it works in that I get a 3-dimensional array, but indexing only works when I don't iterate accross one of the dimensions, and doesn't work if I do.
Indexing works here:
listTempAllDays = []
for j in listGPSDays:
listTempDay = []
for i in listGPSDays[0]:
arrayDay = np.array(i)
listTempDay.append(arrayDay)
arrayTemp = np.array(listTempDay)
listTempAllDays.append(arrayTemp)
arrayGPSDays = np.array(listTempAllDays)
print(arrayGPSDays[0,0,0])
It doesn't work here:
listTempAllDays = []
for j in listGPSDays:
listTempDay = []
for i in j:
arrayDay = np.array(i)
listTempDay.append(arrayDay)
arrayTemp = np.array(listTempDay)
listTempAllDays.append(arrayTemp)
arrayGPSDays = np.array(listTempAllDays)
print(arrayGPSDays[0,0,0])
The difference between the two pieces of code is in the inner for loop. The first piece of code also works for all elements in listGPSDays (e.g. for i in listGPSDays[1]: etc...).
Removing the final print call allows the code to run in the second case, or changing the final line to print(arrayGPSDays[0][0,0]) does also run.
In both cases checking the type at all levels returns <class 'numpy.ndarray'>.
I would like this array indexing to work, if possible - what am I missing?
The following is provided as example data:
Anonymised results from print(arrayGPSDays[0:2,0:2,0:2]), generated using the first piece of code (so that the indexing works! - but also resulting in arrayGPSDays[0] being the same as arrayGPSDays[1]):
[[['1' '2']
['3' '4']]
[['1' '2']
['3' '4']]]

numpy's array constructor can handle arbitrarily dimensioned iterables. They only stipulation is that they can't be jagged (i.e. each "row" in each dimension must have the same length).
Here's an example:
In [1]: list_3d = [[['a', 'b', 'c'], ['d', 'e', 'f']], [['g', 'h', 'i'], ['j', 'k', 'l']]]
In [2]: import numpy as np
In [3]: np.array(list_3d)
Out[3]:
array([[['a', 'b', 'c'],
['d', 'e', 'f']],
[['g', 'h', 'i'],
['j', 'k', 'l']]], dtype='<U1')
In [4]: array_3d = np.array(list_3d)
In [5]: array_3d[0,0,0]
Out[5]: 'a'
In [6]: array_3d.shape
Out[6]: (2, 2, 3)
If the array is jagged, numpy will "squash" down to the dimension where the jagged-ness happens. Since that explanation is clear as mud, an example might help:
In [20]: jagged_3d = [ [['a', 'b'], ['c', 'd']], [['e', 'f'], ['g', 'h'], ['i', 'j']] ]
In [21]: jagged_arr = np.array(jagged_3d)
In [22]: jagged_arr.shape
Out[22]: (2,)
In [23]: jagged_arr
Out[23]:
array([list([['a', 'b'], ['c', 'd']]),
list([['e', 'f'], ['g', 'h'], ['i', 'j']])], dtype=object)
The reason the constructor isn't working out of the box is because you have a jagged array. numpy simply does not support jagged arrays due to the fact that each numpy array has a well-defined shape representing the length of each dimension. So if the items in a given dimension are different lengths, this abstraction falls apart, and numpy simply doesn't allow it.
HTH.

So Isaac, it seems your code have some syntax misinterpretations,
In your for statement, j represents an ITEM inside the list listGPSDays (I assume it is a list), not the ITEM INDEX inside the list, and you don't need to "get" the range of the list, python can do it for yourself, try:
for j in listGPSdays:
instead of
for j in range(len(listGPSDays)):
Also, try changing this line of code from:
for i in listGPSDays[j]:
to:
for i in listGPSDays.index(j):
I think it will solve your problem, hope it works!

Python: how to convert a string array to a factor list

Python 2.7, numpy, create levels in the form of a list of factors.
I have a data file which list independent variables, the last column indicates the class. For example:
2.34,4.23,0.001, ... ,56.44,2.0,"cloudy with a chance of rain"
Using numpy, I read all the numeric columns into a matrix, and the last column into an array which I call "classes". In fact, I don't know the class names in advance, so I do not want to use a dictionary. I also do not want to use Pandas. Here is an example of the problem:
classes = ['a', 'b', 'c', 'c', 'b', 'a', 'a', 'd']
type (classes)
<type 'list'>
classes = numpy.array(classes)
type(classes)
<type 'numpy.ndarray'>
classes
array(['a', 'b', 'c', 'c', 'b', 'a', 'a', 'd'],
dtype='|S1')
# requirements call for a list like this:
# [0, 1, 2, 2, 1, 0, 3]
Note that the target class may be very sparse, for example, a 'z', in perhaps 1 out of 100,000 cases. Also note that the classes may be arbitrary strings of text, for example, scientific names.
I'm using Python 2.7 with numpy, and I'm stuck with my environment. Also, the data has been preprocessed, so it's scaled and all values are valid - I do not want to preprocess the data a second time to extract the unique classes and build a dictionary before I process the data. What I'm really looking for was the Python equivalent to the stringAsFactors parameter in R that automatically converts a string vector to a factor vector when the script reads the data.
Don't ask me why I'm using Python instead of R - I do what I'm told.
Thanks, CC.

You could use np.unique with return_inverse=True to return both the unique class names and a set of corresponding integer indices:
import numpy as np
classes = np.array(['a', 'b', 'c', 'c', 'b', 'a', 'a', 'd'])
classnames, indices = np.unique(classes, return_inverse=True)
print(classnames)
# ['a' 'b' 'c' 'd']
print(indices)
# [0 1 2 2 1 0 0 3]
print(classnames[indices])
# ['a' 'b' 'c' 'c' 'b' 'a' 'a' 'd']
The class names will be sorted in lexical order.

Numpy, 1:M joins on Arrays

I was wondering if there is a way to join an numpy array.
Example:
array1 = [[1,c,d], [2,a,b], [3, e,f]]
array2 = [[2,g,g,t], [1,alpha, beta, gamma], [1,t,y,u], [3,dog, cat, fish]]
I need to join these array, but the Numpy documentation says if the records are not unique, the functions will fail or return unknown results.
Does anyone have any sample to do a 1:M join instead of a 1:1 join on numpy arrays? Also, I know my examples are in the proper numpy format, but it's just to give a general idea.

What you are willing to achieve looks more like a new nested list based on your two input arrays.
Treating them as lists:
list1 = [[1,'c','d'], [2,'a','b'], [3, 'e','f']]
list2 = [[2,'g','g','t'], [1,'alpha', 'beta', 'gamma'], [1,'t','y','u'], [3,'dog', 'cat', 'fish']]
You can build your desired result doing:
result = [i+j[1:] for i in list1 for j in list2 if i[0]==j[0]]
Which will look like this:
[[1, 'c', 'd', 'alpha', 'beta', 'gamma'],
[1, 'c', 'd', 't', 'y', 'u'],
[2, 'a', 'b', 'g', 'g', 't'],
[3, 'e', 'f', 'dog', 'cat', 'fish']]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas multiindex: get level values without duplicates - python

Related

how to automatically drop index levels that only have single value?

Manipulating a list of lists multiple times in order

Indexing failure/odd behaviour with array

Python: how to convert a string array to a factor list

Numpy, 1:M joins on Arrays

Categories

Resources