Pandas changing the first column - python

Hello, as you see here i have a csv file. The problem is first column (leftmost) does not begin with 1 but it is like the column "gene". How can i fix it?
I want that starts with 1 and go to end of the list.

Did you do this?
import pandas as pd
pd.read_csv('your_file.csv', index_col='Gene')
Just remove index_col and it should, by default, create a new index column.

Related

Put a new column inside a Dataframe Python

The Dataframe that I am working on it have a column called "Brand" that have a value called "SEAT " with the white space. I achieved to drop the white space but I don't know how to put the new column inside the previous Dataframe. I have to do this because I need to filter the previous Dataframe by "SEAT" and show these rows.
I tried this:
import pandas as pd
brand_reviewed = df_csv2.Brand.str.rstrip()
brand_correct = 'SEAT'
brand_reviewed.loc[brand_reviewed['Brand'].isin(brand_correct)]
Thank you very much!
As far as I understand,
you're trying to return rows that match the pattern "SEAT".
You are not forced to create a new column. You can directly do the following:
df2 = brand_reviewed[brand_reviewed.Brand.str.rstrip() == "SEAT"]
print(df2)
You have done great. I will also mentioned another form on how you can clead the white spaces. And also, if you just want to add a new column in your current dataframe, just write the last line of this code.
import pandas as pd
brand_reviewed = pd.read_csv("df_csv2.csv")
data2 = data["Brand"].str.strip()
brand_reviewed["New Column"] = data2
If you have another query, let me know.
Octavio Velázquez

I have to extract all the rows in a .csv corresponding to the rows with 'watermelon' through pandas

I am using this code. but instead of new with just the required rows, I'm getting an empty .csv with just the header.
import pandas as pd
df = pd.read_csv("E:/Mac&cheese.csv")
newdf = df[df["fruit"]=="watermelon"+"*"]
newdf.to_csv("E:/Mac&cheese(2).csv",index=False)
I believe the problem is in how you select the rows containing the word "watermelon". Instead of:
newdf = df[df["fruit"]=="watermelon"+"*"]
Try:
newdf = df[df["fruit"].str.contains("watermelon")]
In your example, pandas is literally looking for cells containing the word "watermelon*".
missing the underscore in pd.read_csv on first call, also it looks like the actual location is incorrect. missing the // in the file location.

How to update selected headers only in a CSV using Python?

I want to change my csv headers from
column_1, column_2, ABC_column, column_4, XYZ_column
To
new_column_1, new_column_2, ABC_column, new_column_4, XYZ_column
I can easily change all the columns using writer.writerow but the when there is a new value in place of ABC_column I want to keet that as well, meaning instead of ABC_column if it comes like DEF_column then I also don't want to change that.
So it should only change those columns which do not comes at 3rd place and 5th place and leave the ones that comes at 3rd and 5th place as it is.
Use pandas:
import pandas as pd
df = pd.read_csv(path_to_csv)
df = df.rename(columns={'column_1': 'new_column_1', 'column_2': 'new_column_2' ... })
df.to_csv(path_to_csv)
you can do any type of renaming logic to that dictionary

Add new column to dataframe using Pandas but it goes beyond the length of the data

Im trying to add a new column to a pandas dataframe. Also, I try to give a name to index to be printed out in Excel when I export the data
import pandas as pd
import csv
#read csv file
file='RALS-04.csv'
df=pd.read_csv(file)
#select the columns that I want
column1=df.iloc[:,0]
column2=df.iloc[:,2]
column3=df.iloc[:,3]
column1.index.name="items"
column2.index.name="march2012"
column3.index.name="march2011"
df=pd.concat([column1, column2, column3], axis=1)
#create a new column with 'RALS' as a defaut value
df['comps']='RALS'
#writing them back to a new CSV file
with open('test.csv','a') as f:
df.to_csv(f, index=False, header=True)
The output is the 'RALS' that I added to the dataframe goes to Row 2000 while the data stops at row 15. How to constrain the RALS so that it doesnt go beyond the length of the data being exported? I would also prefer a more elegant, automated way rather than specifying at which row should the default value stops at.
The second question is, the labels that I have assigned to the columns using columns.index.name, does not appear in the output. Instead it is replaced by a 0 and a 1. Please advise solutions.
Thanks so much for inputs

Append new data to a dataframe

I have a csv file with many columns but for simplicity I am explaining the problem using only 3 columns. The column names are 'user', 'A' and 'B'. I have read the file using the read_csv function in pandas. The data is stored as a data frame.
Now I want to remove some rows in this dataframe based on their values. So if value in column A is not equal to a and column B is not equal to b I want to skip those user rows.
The problem is I want to dynamically create a dataframe to which I can append one row at a time. Also I do not know the number of rows that there would be. Therefore, I cannot specify the index when defining the dataframe.
I am using the following code:
import pandas as pd
header=['user','A','B']
userdata=pd.read_csv('.../path/to/file.csv',sep='\t', usecols=header);
df = pd.DataFrame(columns=header)
for index, row in userdata.iterrows():
if row['A']!='a' and row['B']!='b':
data= {'user' : row['user'], 'A' : row['A'], 'B' : row['B']}
df.append(data,ignore_index=True)
The 'data' is being populated properly but I am not able to append. At the end, df comes to be empty.
Any help would be appreciated.
Thank you in advance.
Regarding your immediate problem, append() doesn't modify the DataFrame; it returns a new one. So you would have to reassign df via:
df = df.append(data,ignore_index=True)
But a better solution would be to avoid iteration altogether and simply query for the rows you want. For example:
df = userdata.query('A != "a" and B != "b"')

Categories

Resources