Merging two rows in pandas to one header cell per column - python

i have a table in pandas that looks like this:
0
A
Another
1
header
header
2
First
row
3
Second
row
and what i would like to have a table like this :
0
A header
Another header
1
First
row
2
Second
row
how can i merge cells 0 and 1 to one header column?

There is a question about that column '0', if indeed it is one. But if it is not (and is just the index having been pasted slightly incorrectly), then I would do:
newdf = df.iloc[1:].set_axis(df.columns + ' ' + df.iloc[0], axis=1)
>>> newdf
A header Another header
1 First row
2 Second row

Related

how to move data from one column to another column's row?

I have following DataFrame
0
1
2
3
4
First
row
row
row
row
Second
row
row
row
row
beacuse my dataframe can be longer, I want to rename first 3 columns, and then I want the rows for the next 3 columns, to be dropped as a row for the first column
df.rename(columns={0:'data', 1:'user', 2:'file')
data
user
file
3
4
5
dataa_1
user_1
file_1
dataa_2
user_2
file_2
Second
row
row
row
row
row
and then I want to write some code, so the rest of 3 column's row would be moved as a second row of my first column:
data
user
file
3
4
5
dataa_1
user_1
file_1
row
row
row
dataa_2
user_2
file_2
row
row
row
Maybe something like this
import pandas as pd
df = pd.DataFrame([range(6),range(6)],columns = ['first','second','third','fourth','fifth','six'])
df2 = df[['fourth','fifth','six']]
df2 = df2.rename(columns = {'fourth':'first','fifth':'second','six':'third'})
df3 = pd.concat([df,df2])
For the following result

Python Pandas Drop Consecutive Data Frames but Period (.) at the End is the Differentiator

Hi I have a section of my pandas dataframe that has duplicates, but the difference is minor.
The only differentiator is a period at the end.
Header A
First
First.
I just want to drop the row that has a duplicate that does not have a period.
First sorting by Header A, then remove last . and get last duplicated values by Series.duplicated:
print (df)
Header A
0 First.
1 First
2 First.
3 Second.
4 Second
5 Third
6 Third
df1 = df.sort_values('Header A')
df1 = df1[~df1['Header A'].str.rstrip('.').duplicated(keep='last')]
print (df1)
Header A
2 First.
3 Second.
6 Third
If need prioritize values without .:
df1 = df.sort_values('Header A')
df2 = df1[~df1['Header A'].str.rstrip('.').duplicated()]
print (df2)
Header A
1 First
4 Second
5 Third
Or try loc:
>>> x = df['Header A'].str.split('.', expand=True)
>>> df.loc[x[0].duplicated(keep=False) & x[1].isna()]
Header A
0 First
>>>

append or join value from one dataframe to every row in another dataframe in Pandas

I'm normally OK on the joining and appending front, but this one has got me stumped.
I've got one dataframe with only one row in it. I have another with multiple rows. I want to append the value from one of the columns of my first dataframe to every row of my second.
df1:
id
Value
1
word
df2:
id
data
1
a
2
b
3
c
Output I'm seeking:
df2
id
data
Value
1
a
word
2
b
word
3
c
word
I figured that this was along the right lines, but it listed out NaN for all rows:
df2 = df2.append(df1[df1['Value'] == 1])
I guess I could just join on the id value and then copy the value to all rows, but I assumed there was a cleaner way to do this.
Thanks in advance for any help you can provide!
Just get the first element in the value column of df1 and assign it to value column of df2
df2['value'] = df1.loc[0, 'value']

Pandas read_csv skiprows with conditional statements

I have a bunch of txt files that i need to compile into a single master file. I use read_csv to extract the information inside. There are some rows to drop, and i was wondering if it's possible to use the skiprows feature without specifying the index number of rows that i want to drop, but rather to tell which one to drop according to its row content/value. Here's how the data looks like to illustrate my point.
Index Column 1 Column 2
0 Rows to drop Rows to drop
1 Rows to drop Rows to drop
2 Rows to drop Rows to drop
3 Rows to keep Rows to keep
4 Rows to keep Rows to keep
5 Rows to keep Rows to keep
6 Rows to keep Rows to keep
7 Rows to drop Rows to drop
8 Rows to drop Rows to drop
9 Rows to keep Rows to keep
10 Rows to drop Rows to drop
11 Rows to keep Rows to keep
12 Rows to keep Rows to keep
13 Rows to drop Rows to drop
14 Rows to drop Rows to drop
15 Rows to drop Rows to drop
What is the most effective way to do this?
Is this what you want to achieve:
import pandas as pd
df = pd.DataFrame({'A':['row 1','row 2','drop row','row 4','row 5',
'drop row','row 6','row 7','drop row','row 9']})
df1 = df[df['A']!='drop row']
print (df)
print (df1)
Original Dataframe:
A
0 row 1
1 row 2
2 drop row
3 row 4
4 row 5
5 drop row
6 row 6
7 row 7
8 drop row
9 row 9
New DataFrame with rows dropped:
A
0 row 1
1 row 2
3 row 4
4 row 5
6 row 6
7 row 7
9 row 9
While you cannot skip rows based on content, you can skip rows based on index. Here are some options for you:
skip n number of row:
df = pd.read_csv('xyz.csv', skiprows=2)
#this will skip 2 rows from the top
skip specific rows:
df = pd.read_csv('xyz.csv', skiprows=[0,2,5])
#this will skip rows 1, 3, and 6 from the top
#remember row 0 is the 1st line
skip nth row in the file
#you can also skip by counts.
#In below example, skip 0th row and every 5th row from there on
def check_row(a):
if a % 5 == 0:
return True
return False
df = pd.read_csv('xyz.txt', skiprows= lambda x:check_row(x))
More details of this can be found in this link about skip rows
No. skiprows will not allow you to drop based on the row content/value.
Based on Pandas Documentation:
skiprows : list-like, int or callable, optional
Line numbers to skip (0-indexed) or
number of lines to skip (int) at the start of the file.
If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False
otherwise. An example of a valid callable argument would be lambda x:
x in [0, 2].
Since you cannot do that using skiprows, I could think of this way as efficient :
df = pd.read_csv(filePath)
df = df.loc[df['column1']=="Rows to keep"]

How to hide axis labels in python pandas dataframe?

I've used the following code to generate a dataframe which is supposed to be the input for a seaborn plot.
data_array = np.array([['index', 'value']])
for x in range(len(value_list)):
data_array = np.append(data_array, np.array([[int((x + 1)), int(value_list[x])]]), axis = 0)
data_frame = pd.DataFrame(data_array)
The output looks something like this:
0 1
0 index values
1 1 value_1
2 2 value_2
3 3 value_3
However, with this dataframe, seaborn returns an error. When comparing my data to the examples, I see that the first row is missing. The samples, being loaded in with load_dataset(), look something like this:
0 index values
1 1 value_1
2 2 value_2
3 3 value_3
How do I remove the first row of axis labels of my dataframe so that it looks like the samples provided? Removing the first row removes the strings "index" and "values", but not the axis label.
Numpy thinks that index and values row is also a row of the values of the dataframe and not the names of the columns.
I think this would be more pythonic way of doing this:
pd.DataFrame(list(enumerate(value_list, 1)), columns=['index', 'values'])
Don't know what value_list is. However I would recommend another way to create dataframe:
import pandas as pd
value_list = ['10', '20', '30']
data_frame = pd.DataFrame({
'index': range(len(value_list)),
'value': [int(x) for x in value_list]})
data_frame:
index value
0 0 10
1 1 20
2 2 30
Now you can easily change dataframe index and 'index' column:
data_frame.loc[:, 'index'] += 1
data_frame.index += 1
data_frame:
index value
1 1 10
2 2 20
3 3 30
Try:
new_header = df.iloc[0] #grab the first row for the header
df = df[1:] #take the data less the header row
df.columns = new_header #set the header row as the df header
just slice your dataframe
df =data_frame[2:]
df.columns = data_frame.iloc[1] --it will set the column name

Categories

Resources