So I have an excel sheet with the following format:
Now what I'm looking to do is to loop trough each index cell in column A and assign all cells the same value until the next 0 is reached. so for example:
Now I have tried importing the excel file into a pandas dataframe and then using for loops to do this, but I can't seem to make it work. Any suggestions or directions to the appropriate method would be much appreciated!
Thank you for your time
Edit:
Using #wen-ben's method: s.index=pd.Series((s.index==0).cumsum()).map({1:'bananas',2:'cherries',3:'pineapples'})
just enters the first element (bananas) for all cells in Column A
Assuming you have dataframe s using cumsum
s.index=pd.Series((s.index==0).cumsum()).map({1:'bananas',2:'cherries',3:'pineapples'})
Related
I am looking to delete a row in a dataframe that is imported into python by pandas.
if you see the sheet below, the first column has same name multiple times. So the condition is, if the first column value re-appears in a next row, delete that row. If not keep that frame in the dataframe.
My final output should look like the following:
Presently I am doing it by converting each column into a list and deleting them by index values. I am hoping there would be an easy way. Rather than this workaround/
df.drop_duplicates([df.columns[0])
should do the trick.
Try the following code;
df.drop_duplicates(subset='columnName', keep=’first’, inplace=true)
I imported a .csv file with a single column of data into a dataframe that I am trying to clean up by splitting the column based on various string occurrences within the cells. I've tried numerous means to split the column, but can't seem to get it to work. My latest attempt was using the following:
df.loc[:,'DataCol'] = df.DataCol.str.split(pat=':\n',expand=True)
df
The result is a dataframe that is still one column and completely unchanged. What am I doing wrong? This is my first time doing anything like this so please forgive the simple question.
Df.loc creates a copy of the column you've selected - try replacing the code below with df['DataCol'], which references the actual column in the original dataframe.
df.loc[:,'DataCol']
I was able to append dataframes but as they are added, they appear at the end of the one previously appended an so on.
Each dataframe has a different header name.
Here’s what I’ve tried so far:
df1 = df1.append(dforiginal,sort=False, ignore_index=False)
What’s more, every time they are appended, their index is set back to 0. Is it possible to append each dataframe all starting at Index=0?
The screenshots below show what I'm getting(top image) and what I'm trying to accomplish (bottom image).
Thanks.
[1
If I got your point correctly you want to add rows instead of columns to your Dataframe, dont you?
Nevertheless, you could use for example this website to get a general overview on how to use the append function: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
Moreover, you can reset the index if you set the keyword ignore_index as True.
I must read each row of an excel file and preform calculations based on the contents of each row. Each row is divided in columns, my problem is that I cannot find a way to access the contents of those columns.
I'm reading the rows with:
for i in df.index,:
print(df.loc[i])
Which works well, but when I try to access, say, the 4h column with this type of indexing I get an error:
for i in df.index,:
print(df.loc[i][3])
I'm pretty sure I'm approaching the indexing issue in the wrong way, but I cannot figure put how to solve it.
You can use iterrows(), like in the following code:
for index, row in dataFrame.iterrows():
print(row)
But this is not the most efficient way to iterate over a panda DataFrame, more info at this post.
I have a little problem on excel using xlwings and i really don't know how to fix it.
When i'm using an UDF function that return for example a panda dataframe, let suppose that my dataframe is 3 colums width (no necessary condition on rows), then on the 4th columns in excel, if i write some datas on it, my panda dataframe will erase it as soon as i calculate the sheet... Although the dataframe is not using this column at all while it's 3 columns large and not 4 ...
I don't know if i'm clear enough. Let me know !
thank you very much in advance.
#xw.func
#xw.ret(expand='table')
def hello(nb):
nb = int(nb)
return [["hello","you"] for i in range(nb)]
before recalculate the sheet
after recalculate the sheet
It seems that in the documentation of xlwings, it is necessary to have an empty row and column at the bottom and to the right. if not it will overwrite it
http://docs.xlwings.org/en/stable/api.html#xlwings.xlwings.ret