I have following DataFrame
0
1
2
3
4
First
row
row
row
row
Second
row
row
row
row
beacuse my dataframe can be longer, I want to rename first 3 columns, and then I want the rows for the next 3 columns, to be dropped as a row for the first column
df.rename(columns={0:'data', 1:'user', 2:'file')
data
user
file
3
4
5
dataa_1
user_1
file_1
dataa_2
user_2
file_2
Second
row
row
row
row
row
and then I want to write some code, so the rest of 3 column's row would be moved as a second row of my first column:
data
user
file
3
4
5
dataa_1
user_1
file_1
row
row
row
dataa_2
user_2
file_2
row
row
row
Maybe something like this
import pandas as pd
df = pd.DataFrame([range(6),range(6)],columns = ['first','second','third','fourth','fifth','six'])
df2 = df[['fourth','fifth','six']]
df2 = df2.rename(columns = {'fourth':'first','fifth':'second','six':'third'})
df3 = pd.concat([df,df2])
For the following result
Related
DF1 =[
Column A
Column B
Cell 1
Cell 2
Cell 3
Cell 4
Column A
Column B
Cell 1
Cell 2
Cell 3
Cell 4
]
DF2 = [ NY, FL ]
in this case DF1 and DF2 have two indexes.
The result I am looking for is the following
Main_DF =
[
Column A
Column B
Column C
Cell 1
Cell 2
NY
Cell 3
Cell 4
NY
Column A
Column B
Column C
Cell 1
Cell 2
FL
Cell 3
Cell 4
FL
]
I tried to use pd.concat, assign and insert
none give me the way I'm looking for the result to be
Lists hold references to dataframes. So, you can amend the dataframes and not need to amend the list at all.
So, I'd do something like...
for df, val in zip(DF1, DF2):
df['Column C'] = val
Using zip allows you to iterate though the two lists in sync with each other;
1st element of DF1 goes in to df, and 1st element of DF2 goes into val
2nd element of DF1 goes in to df, and 2nd element of DF2 goes into val
and so on
i have a table in pandas that looks like this:
0
A
Another
1
header
header
2
First
row
3
Second
row
and what i would like to have a table like this :
0
A header
Another header
1
First
row
2
Second
row
how can i merge cells 0 and 1 to one header column?
There is a question about that column '0', if indeed it is one. But if it is not (and is just the index having been pasted slightly incorrectly), then I would do:
newdf = df.iloc[1:].set_axis(df.columns + ' ' + df.iloc[0], axis=1)
>>> newdf
A header Another header
1 First row
2 Second row
I'm normally OK on the joining and appending front, but this one has got me stumped.
I've got one dataframe with only one row in it. I have another with multiple rows. I want to append the value from one of the columns of my first dataframe to every row of my second.
df1:
id
Value
1
word
df2:
id
data
1
a
2
b
3
c
Output I'm seeking:
df2
id
data
Value
1
a
word
2
b
word
3
c
word
I figured that this was along the right lines, but it listed out NaN for all rows:
df2 = df2.append(df1[df1['Value'] == 1])
I guess I could just join on the id value and then copy the value to all rows, but I assumed there was a cleaner way to do this.
Thanks in advance for any help you can provide!
Just get the first element in the value column of df1 and assign it to value column of df2
df2['value'] = df1.loc[0, 'value']
I have a bunch of txt files that i need to compile into a single master file. I use read_csv to extract the information inside. There are some rows to drop, and i was wondering if it's possible to use the skiprows feature without specifying the index number of rows that i want to drop, but rather to tell which one to drop according to its row content/value. Here's how the data looks like to illustrate my point.
Index Column 1 Column 2
0 Rows to drop Rows to drop
1 Rows to drop Rows to drop
2 Rows to drop Rows to drop
3 Rows to keep Rows to keep
4 Rows to keep Rows to keep
5 Rows to keep Rows to keep
6 Rows to keep Rows to keep
7 Rows to drop Rows to drop
8 Rows to drop Rows to drop
9 Rows to keep Rows to keep
10 Rows to drop Rows to drop
11 Rows to keep Rows to keep
12 Rows to keep Rows to keep
13 Rows to drop Rows to drop
14 Rows to drop Rows to drop
15 Rows to drop Rows to drop
What is the most effective way to do this?
Is this what you want to achieve:
import pandas as pd
df = pd.DataFrame({'A':['row 1','row 2','drop row','row 4','row 5',
'drop row','row 6','row 7','drop row','row 9']})
df1 = df[df['A']!='drop row']
print (df)
print (df1)
Original Dataframe:
A
0 row 1
1 row 2
2 drop row
3 row 4
4 row 5
5 drop row
6 row 6
7 row 7
8 drop row
9 row 9
New DataFrame with rows dropped:
A
0 row 1
1 row 2
3 row 4
4 row 5
6 row 6
7 row 7
9 row 9
While you cannot skip rows based on content, you can skip rows based on index. Here are some options for you:
skip n number of row:
df = pd.read_csv('xyz.csv', skiprows=2)
#this will skip 2 rows from the top
skip specific rows:
df = pd.read_csv('xyz.csv', skiprows=[0,2,5])
#this will skip rows 1, 3, and 6 from the top
#remember row 0 is the 1st line
skip nth row in the file
#you can also skip by counts.
#In below example, skip 0th row and every 5th row from there on
def check_row(a):
if a % 5 == 0:
return True
return False
df = pd.read_csv('xyz.txt', skiprows= lambda x:check_row(x))
More details of this can be found in this link about skip rows
No. skiprows will not allow you to drop based on the row content/value.
Based on Pandas Documentation:
skiprows : list-like, int or callable, optional
Line numbers to skip (0-indexed) or
number of lines to skip (int) at the start of the file.
If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False
otherwise. An example of a valid callable argument would be lambda x:
x in [0, 2].
Since you cannot do that using skiprows, I could think of this way as efficient :
df = pd.read_csv(filePath)
df = df.loc[df['column1']=="Rows to keep"]
I have a csv file with 1000 rows and 1000 columns
I have just found that I can call each component from specific rows and columns using
df = pd.read_csv('name.csv', sep=",")
print(df.iloc[120, 250])
which means I am calling component from row 120 and column 250.
but my question is, how can I call a component with its name of column and the value of its row not the values of its column and the value of its row.
for example the name or row of column 1 is 23
for column 2 I have 43
for column 3 the first row is 55
If I write df.iloc[0, 2] it will be 55 and df.iloc[0, 0] is 23,I want instead of writing the value of column (for example 2 or 0 or 6) to force the code to give values of column starting with 55 or 23
If you are talking about accesing a value by the name of the column and number of row in
df.at[index:column]