I'm trying to make python append all the data starting from count=1 to the next column but it prints it at the bottom of my result from count=0.
im using 'self' because of my class and function. the first time that 'count==0' it makes two columns. first column is my 'self.header' and the second one is 'self.oneVariableSum(self.times2)'. but once the count goes to 1, it adds 'self.oneVariableSum(self.times2)' to the bottom end of second column. but i need it to be in a new column instead.
i have the portion of that code below but i cant figure what i'm doing wrong.
if (count==0):
self.all.append([self.header,self.oneVariableSum(self.times2)])
else:
self.all.append([[None,self.oneVariableSum(self.times2)]])
As the other ones said, it's not really possible/easy to do it using python lists. i ended up converting it to a panda dataframe. and used the line below to append the new result in a new column.
self.result=pd.concat([all,all2],axis=1, sort=False)
This did the trick.
Related
I am looking to delete a row in a dataframe that is imported into python by pandas.
if you see the sheet below, the first column has same name multiple times. So the condition is, if the first column value re-appears in a next row, delete that row. If not keep that frame in the dataframe.
My final output should look like the following:
Presently I am doing it by converting each column into a list and deleting them by index values. I am hoping there would be an easy way. Rather than this workaround/
df.drop_duplicates([df.columns[0])
should do the trick.
Try the following code;
df.drop_duplicates(subset='columnName', keep=’first’, inplace=true)
I imported a .csv file with a single column of data into a dataframe that I am trying to clean up by splitting the column based on various string occurrences within the cells. I've tried numerous means to split the column, but can't seem to get it to work. My latest attempt was using the following:
df.loc[:,'DataCol'] = df.DataCol.str.split(pat=':\n',expand=True)
df
The result is a dataframe that is still one column and completely unchanged. What am I doing wrong? This is my first time doing anything like this so please forgive the simple question.
Df.loc creates a copy of the column you've selected - try replacing the code below with df['DataCol'], which references the actual column in the original dataframe.
df.loc[:,'DataCol']
I was able to append dataframes but as they are added, they appear at the end of the one previously appended an so on.
Each dataframe has a different header name.
Here’s what I’ve tried so far:
df1 = df1.append(dforiginal,sort=False, ignore_index=False)
What’s more, every time they are appended, their index is set back to 0. Is it possible to append each dataframe all starting at Index=0?
The screenshots below show what I'm getting(top image) and what I'm trying to accomplish (bottom image).
Thanks.
[1
If I got your point correctly you want to add rows instead of columns to your Dataframe, dont you?
Nevertheless, you could use for example this website to get a general overview on how to use the append function: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
Moreover, you can reset the index if you set the keyword ignore_index as True.
So I have an excel sheet with the following format:
Now what I'm looking to do is to loop trough each index cell in column A and assign all cells the same value until the next 0 is reached. so for example:
Now I have tried importing the excel file into a pandas dataframe and then using for loops to do this, but I can't seem to make it work. Any suggestions or directions to the appropriate method would be much appreciated!
Thank you for your time
Edit:
Using #wen-ben's method: s.index=pd.Series((s.index==0).cumsum()).map({1:'bananas',2:'cherries',3:'pineapples'})
just enters the first element (bananas) for all cells in Column A
Assuming you have dataframe s using cumsum
s.index=pd.Series((s.index==0).cumsum()).map({1:'bananas',2:'cherries',3:'pineapples'})
I read in a CSV file
times = pd.read_csv("times.csv",header=0)
times.columns.values
The column names are in a list
titles=('case','num_gen','year')
titles are much longer and complex but for simplicity sake, it is truncated here.
I want to call an index of a column of times using an index from titles.
My attempt is:
times.titles[2][0]
This is tho try to get the effect of:
times.year[0]
I need to do this because there are 75 columns that I need to call in a loop, therefore, I can not have each column name typed out as in the line above.
Any ideas on how to accomplish this?
I think you need to use .iloc let's look at the pandas doc on selection by position:
time.iloc[2,0] #will return the third row and first column, the indexes are zero-based.