Cannot sort dataframe – the column label is not unique

Cannot sort dataframe – the column label is not unique - python

I merge three dataframes with the first line and try to sort them with the second. This used to work fine, but now I get this error (our company may have updated the python version during this time):
ValueError: The column label 'Areanr' is not unique.
For a multi-index, the label must be a tuple with elements corresponding to each level.
The code looks like this
pref_info4 = pref_info1.append(pref_info2).append(pref_info3)
pref_info4 = pref_info4.sort_values(['Areanr','nr'])
The second line gives the error. When inspecting 'pref_info4' after the first line is done there is only one column with the label 'Areanr'. Is there some hidden labels that I need to remove? Otherwise it should be unique right? Each of the original dataframes has columns Areanr and nr, but this worked fine (and I cannot see any bad merging issue when inspecting pref_info4...)

You can try to assign a running id to each columns so that they are unique and then sort them this running total

Related

Is there a built in function in pandas that does the following?

i have two dataframes df1 and df2. they have a column with common values but in df1['comon_values_column'] every value comes up only once while in df2['comon_values_column'] each value can come up more than once. I want to see if i can do the following with a single line and without a loop
for value in df2['comon_values_column']:
df2['empty_column'].loc[df2['comon_values_column']==value]=df1['other_column'].loc[df1['comon_values_column']==value]
i have tried to use merge but because of the size of the dataframe it is very difficult to make sure that it does exactly what i want

Problems with DataFrame indexing with pandas

Using pandas, I have to modify a DataFrame so that it only has the indexes that are also present in a vector, which was acquired by performing operations in one of the df's columns. Here's the specific line of code used for that (please do not mind me picking the name 'dataset' instead of 'dataframe' or 'df'):
dataset = dataset.iloc[list(set(dataset.index).intersection(set(vector.index)))]
it worked, and the image attached here shows the df and some of its indexes. However, when I try accessing a specific value by index in the new 'dataset', such as the line shown below, I get an error: single positional indexer is out-of-bounds
print(dataset.iloc[:, 21612])
note: I've also tried the following, to make sure it isn't simply an issue with me not knowing how to use iloc:
print(dataset.iloc[21612, :])
and
print(dataset.iloc[21612])
Do I have to create another column to "mimic" the actual indexes? What am I doing wrong? Please mind that it's necessary for me to make it so the indexes are not changed at all, despite the size of the DataFrame changing. E.g. if the DataFrame originally had 21000 rows and the new one only 15000, I still need to use the number 20999 as an index if it passed the intersection check shown in the first code snippet. Thanks in advance

Try this:
print(dataset.loc[21612, :])
After you have eliminated some of the original rows, the first (i.e., index) argument to iloc[] must not be greater than len(index) - 1.

How to append dataframes in Pandas without staggered format

I was able to append dataframes but as they are added, they appear at the end of the one previously appended an so on.
Each dataframe has a different header name.
Here’s what I’ve tried so far:
df1 = df1.append(dforiginal,sort=False, ignore_index=False)
What’s more, every time they are appended, their index is set back to 0. Is it possible to append each dataframe all starting at Index=0?
The screenshots below show what I'm getting(top image) and what I'm trying to accomplish (bottom image).
Thanks.
[1

If I got your point correctly you want to add rows instead of columns to your Dataframe, dont you?
Nevertheless, you could use for example this website to get a general overview on how to use the append function: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
Moreover, you can reset the index if you set the keyword ignore_index as True.

How to use DataFrame.isin without the constraint of having to match both index and value?

So, I have two files one with 6 million entries and the other with around 5 million entries. I want to compare a particular column values in both the dataframes. This is the code that I have used:
print(df1['Col1'].isin(df2['col3']).value_counts())
This is essential for me as I want to see the number of True(same) and False(different). I am getting most of the entries around 95% as true however some 5% data is coming as false. I extracted this data by using to_csv and compared the columns using vimdiff and they are all identical, then why is the code labelling them as false(different)? Is there a better and more fullproof method?
Note: I have checked for whitespace in the columns as well. There is no whitespace.
PS. The Pandas.isin documentation states that both index and value has to match. Since I have more entries in 1 file, so the index is not matching for these entries, how to remove that constraint?

First, convert the column you use as parameter inside your isin() method as a list.
Then parse it as a copy of your df1 dataframe because you need to get the value counts at the same column you filtered.
From your example:
print(df1[df1['Col1'].isin(df2['col3'].values.tolist())]['Col1'].value_counts())
Try running that again.

Make the data from the second column stay at the second column

I'm making a form using reportlab and its in two columns. The second columns is just a copy of the first column.
I used Frame() function to create two columns and I used a Spacer() function to separate the original form from the copied form into two columns.
My expected result is to make the data from the second column stay in place. But the result that I'm getting is when the data from the first columns gets shorter the second columns starts shifting up and moves to the first column.

If I get your question correct, the problem is that you use a spacer to control the contents' visual placement in two columns/frames. By this, you see it as a single long column split in two, meanwhile you need to see it as two separate columns (two separate frames).
Therefore you will get greater control if you end the first frame (with FrameBreak() before start filling the other and only use the spacer to control any visual design within the same frame.
Tools you need to be aware of are:
FrameBreak(), if you search for it you will find many code examples.
e.g. you fill frame 1 with with 10 lines of text, then you insert a FramBreak() and instruct the script to start filling the second column.
Another tool you should be aware of is the settings used e.g for BaseDocTemplate:
allowSplitting: If set to 1, flowables (eg, paragraphs) may be split across frames or pages. If 0, you force content into the same frame. (default: 1, disabled with 0).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cannot sort dataframe – the column label is not unique - python

You can try to assign a running id to each columns so that they are unique and then sort them this running total

Related

Is there a built in function in pandas that does the following?

Problems with DataFrame indexing with pandas

How to append dataframes in Pandas without staggered format

How to use DataFrame.isin without the constraint of having to match both index and value?

Make the data from the second column stay at the second column

Categories

Resources