I have a data frame named df1 like this:
as_id TCGA_AF_2687 TCGA_AF_2689_Norm TCGA_AF_2690 TCGA_AF_2691_Norm
31 1 5 9 2
I wanna select all the columns which end with "Norm", I have tried the code down below
import os;
print os.getcwd()
os.chdir('E:/task')
import pandas as pd
df1 = pd.read_table('haha.txt')
Norms = []
for s in df1.columns:
if s.endswith('Norm'):
Norms.append(s)
print Norms
but I only get a list of names. what can I do to select all the columns including their values rather than just the columns names? I know it may be a silly question, but I am a new beginner, really need someone to help, thank you so much for your kindness and your time.
df1[Norms] will get the actual columns from df1.
As a matter of fact the whole code can be simplified to
import os
import pandas as pd
os.chdir('E:/task')
df1 = pd.read_table('haha.txt')
norm_df = df1[[column for column in df1.columns if column.endswith('Norm')]]
One can also use the filter higher-order function:
newdf = df[list(filter(lambda x: x.endswith("Norm"),df.columns))]
print(newdf)
Output:
TCGA_AF_2689_Norm TCGA_AF_2691_Norm
0 5 2
Related
I have saved out a data column as follows:
[[A,1], [B,5], [C,18]....]
i was hoping to group A,B,C as shown above into Category and 1,5,18 into Values/Series for updating of my powerpoint chart using python pptx.
Example:
Category
Values
A
1
B
5
Is there any way i can do it? currently the above example is also extracted as strings so i believe i have to convert it to lists first?
thanks in advance!
Try to parse your strings (a list of lists) then create your dataframe from the real list:
import pandas as pd
import re
s = '[[A,1], [B,5], [C,18]]'
cols = ['Category', 'Values']
data = [row.split(',') for row in re.findall('\[([^]]+)\]', s[1:-1])]
df = pd.DataFrame(data, columns=cols)
print(df)
# Output:
Category Values
0 A 1
1 B 5
2 C 18
You should be able to just use pandas.DataFrame and pass in your data, unless I'm misunderstanding the question. Anyway, try:
df = pandas.DataFrame(data=d, columns = ['Category', 'Value'])
where d is your list of tuples.
from prettytable import PrettyTable
column = [["A",1],["B",5],["C",18]]
columnname=[]
columnvalue =[]
t = PrettyTable(['Category', 'Values'])
for data in column:
columnname.append(data[0])
columnvalue.append(data[1])
t.add_row([data[0], data[1]])
print(t)
I am quite new to Python programming.
I am working with the following dataframe:
Before
Note that in column "FBgn", there is a mix of FBgn and FBtr string values. I would like to replace the FBtr-containing values with FBgn values provided in the adjacent column called "## FlyBase_FBgn". However, I want to keep the FBgn values in column "FBgn". Maybe keep in mind that I am showing only a portion of the dataframe (reality: 1432 rows). How would I do that? I tried the replace() method from Pandas, but it did not work.
This is actually what I would like to have:
After
Thanks a lot!
With Pandas, you could try:
df.loc[df["FBgn"].str.contains("FBtr"), "FBgn"] = df["## FlyBase_FBgn"]
Welcome to stackoverflow. Please next time provide more info including your code. It is always helpful
Please see the code below, I think you need something similar
import pandas as pd
#ignore the dict1, I just wanted to recreate your df
dict1= {"FBgn": ['FBtr389394949', 'FBgn3093840', 'FBtr000025'], "FBtr": ['FBgn546466646', '', 'FBgn15565555']}
df = pd.DataFrame(dict1) #recreating your dataframe
#print df
print(df)
#function to replace the values
def replace_values(df):
for i in range(0, (df.size//2)):
if 'tr' in df['FBgn'][i]:
df['FBgn'][i] = df['FBtr'][i]
return df
df = replace_values(df)
#print new df
print(df)
I have a dataframe df1, like this:
date sentence
29/03/1029 I like you
30/03/2019 You eat cake
and run functions getVerb and getObj to dataframe df1. So, the output is like this:
date sentence verb object
29/03/1029 I like you like you
30/03/2019 You eat cake eat cake
I want those functions (getVerb and getObj) run for each line in df1. Could someone help me to solve this problem in an efficient way?
Thank you so much.
Each column of a pandas DataFrame is a Series. You can use the Series.apply or Series.map functions to get the result you want.
df1['verb'] = df1['sentence'].apply(getVerb)
df1['object'] = df1['sentence'].apply(getObj)
# OR
df1['verb'] = df1['sentence'].map(getVerb)
df1['object'] = df1['sentence'].map(getObj)
See the pandas documentation for more details on Series.apply or Series.map.
Assume you have a pandas dataframe such as:
import pandas as pd, numpy as np
df = pd.DataFrame([[4, 9]] *3, columns=['A', 'B'])
>>>df
A B
4 9
4 9
4 9
Let's say, we want sum of columns A and B row wise and column wise. To accomplish it, we write
df.apply(np.sum, axis = 1) # for row-wise sum
Output: 13
13
13
df.apply(np.sum, axis = 0) # for column-wise sum
Output: A 12
B 27
Now, if you want to apply any function for specific set of columns, you may choose a subset from the data-frame.
For example: I want to compute sum over column A only.
df['A'].apply(np.sum, axis =1)
Dataframe.apply
You may refer the above link as well. Other than that, Series.map, Series.apply could be handy as well, as mentioned in the above answer.
Cheers!
Using a simple loop: (assuming that columns already exist in the data frame having names 'verb' and 'object')
for index, row in df1.iterrows():
df1['verb'].iloc[index]= getVerb(row['sentence'])
df1['object'].iloc[index]= getObj(row['sentence'])
I'm trying to use python to read my csv file extract specific columns to a pandas.dataframe and show that dataframe. However, I don't see the data frame, I receive Series([], dtype: object) as an output. Below is the code that I'm working with:
My document consists of:
product sub_product issue sub_issue consumer_complaint_narrative
company_public_response company state zipcode tags
consumer_consent_provided submitted_via date_sent_to_company
company_response_to_consumer timely_response consumer_disputed?
complaint_id
I want to extract :
sub_product issue sub_issue consumer_complaint_narrative
import pandas as pd
df=pd.read_csv("C:\\....\\consumer_complaints.csv")
df=df.stack(level=0)
df2 = df.filter(regex='[B-F]')
df[df2]
import pandas as pd
input_file = "C:\\....\\consumer_complaints.csv"
dataset = pd.read_csv(input_file)
df = pd.DataFrame(dataset)
cols = [1,2,3,4]
df = df[df.columns[cols]]
Here specify your column numbers which you want to select. In dataframe, column start from index = 0
cols = []
You can select column by name wise also. Just use following line
df = df[["Column Name","Column Name2"]]
A simple way to achieve this would be as follows:
df = pd.read_csv("C:\\....\\consumer_complaints.csv")
df2 = df.loc[:,'B':'F']
Hope that helps.
This worked for me, using slicing:
df=pd.read_csv
df1=df[n1:n2]
Where $n1<n2# are both columns in the range, e.g:
if you want columns 3-5, use
df1=df[3:5]
For the first column, use
df1=df[0]
Though not sure how to select a discontinuous range of columns.
We can also use i.loc. Given data in dataset2:
dataset2.iloc[:3,[1,2]]
Will spit out the top 3 rows of columns 2-3 (Remember numbering starts at 0)
Then dataset2.iloc[:3,[1,2]] spits out
So i have Dataframe that has around 40 columns. They contain (made up) scores for a test. The columns are now named as follows:
Student, Date, Score, Score.1, Score.2 all the way to Score.39.
We were asked to reset the column names so they match the score (change Score to Score.1, Score.1 to Score.2, Score.2 to Score.3 and so on).
My code looks like this now:
import pandas as pd
prog = pd.read_excel('File.xlsx')
for c in prog.columns:
prog[c].rename(columns = lambda x : 'Score_' + x)
Unfortunatly this does not give the output i want it to.I was hoping someone could show me how to do this.
Thanks in advance
John Galt came up with the solution in the comments: cols = df.columns.tolist() and df.columns = cols[:2] + ['Score_%i' % i for i in xrange(1, len(cols[2:])+1)]