for loop through multiple items based on a dataframe - python

So I have several dataframes of different widths.
I want a for loop that will perform an operation on each dataframe's columns:
Table 1:
col1
col2
col3
Hi
1
Jake
Bye
2
Mike
Red
Blue
Pink
Table 2:
cl1
cl2
cl3
c4
Frank
Toy
Hello
Present
Bike
Ride
Blue
Mike
Red
Blue
Pink
Fred
These tables are in the form a list of tuples.
I want to take these two loops an effectively just have one loop that takes the number of header as the number of items to loop through.
row = 1
col = 0
for col1, col2, col3 in (table):
worksheet.write(row, col, col1)
worksheet.write(row, col + 1, col2)
worksheet.write(row, col + 2, col3)
row += 1
row = 1
col = 0
for cl1, cl2, cl3, cl4 in (table):
worksheet.write(row, col, cl1)
worksheet.write(row, col + 1, cl2)
worksheet.write(row, col + 2, cl3)
worksheet.write(row, col + 2, cl3)
row += 1
Here's what I want
iterate through each column in the table no matter the number of columns. What I think it would look like
row = 1
col = 0
elements = table.column.names
for elements in (table):
for i in elements:
worksheet.write(row, col, i)
col = col +1
row = row +1

It looks to me that you are looking for something like this:
import pandas as pd
import xlsxwriter
book = xlsxwriter.Workbook('example.xlsx')
worksheet = book.add_worksheet("sheet1")
shape = df2.shape
for col in range(shape[1]):
for row in range(shape[0]):
worksheet.write(row, col, df2.iloc[row, col])
book.close()
what generate:

Firstly, you DON'T have any dataframes in your code. In python DataFrames are a specific container type provided in the pandas module. And while you don't have any here, you probably should as they are more useful than lists of tuples. Secondly, if I understand you correctly, you want the ability to loop through any table you provide and perform the same actions. This is what functions/methods are for. Declare a function that you then call with each table as a parameter. Assuming your tables are dataframes:
def write_to_sheet(table):
for row in table.index:
for column in table:
r = list(tables.index).index(row)
c = list(tables.columns).index(column)
worksheet.write(r, c, table.loc[row][column])
write_to_sheet(table1)
write_to_sheet(table2)
You will of course need to import your data into pandas dataframes but that is a subject for another question (which you will of course research before asking here as it has been asked many times before).

Related

Merging two rows in pandas to one header cell per column

i have a table in pandas that looks like this:
0
A
Another
1
header
header
2
First
row
3
Second
row
and what i would like to have a table like this :
0
A header
Another header
1
First
row
2
Second
row
how can i merge cells 0 and 1 to one header column?
There is a question about that column '0', if indeed it is one. But if it is not (and is just the index having been pasted slightly incorrectly), then I would do:
newdf = df.iloc[1:].set_axis(df.columns + ' ' + df.iloc[0], axis=1)
>>> newdf
A header Another header
1 First row
2 Second row

How to move value to another columns by if else condition?

I want to move the column value to another column depending on the condition.
In the table below, if column A is 4 or more, the value of A1_1 is moved to A1_3, if the value is 3, it is moved to A1_2, and if the value is less than 2, the value is kept in A1_1.
I want to apply the same logic to columns B, B_1, B1_2, and B1_3.
How to approach it?
A B A1_1 A1_2 A1_3 B1_1 B1_2 B1_3
1 1 Apple Apple
2 2 Banana Banana
3 3 Tomato Tomato
4 4 Apple Apple
5 5 Banana Banana
You can iterate over your Dataframe and easily apply this logic :
import numpy as np
for index,row in df.iterrows():
if row['A']>=4:
row['A1_3']=row['A1_1']
df.at[index,'A1_1']=np.nan
elif row['A']==3:
row['A1_2']=row['A1_1']
df.at[index,'A1_1']=np.nan
df.iterrows() return index and row in each iteration. index is index of row and row is entire row which you can access each cell by row['Name_Of_Column']
according to this answer you can replace np.nan with None or 0 depend on your needs.
You can use .apply(axis=1) to iterate through each row and move the columns as you deem fit.
def update_row(row):
col = 'A_1'
if(row['A']>3):
col='A_3'
elif(row['A']==3):
col='A_2'
# Set A_1 to empty string and move the value to the required column
a1_val = row['A_1']
row['A_1'] = ""
row[col] = a1_val
return row
df.apply(lambda x: update_row(x), axis=1)

How to display a list pandas DataFrame cell as multiple lines

Not sure if this makes any sense but essentially I have a dataframe that looks something like this:
col 1 (str)
col 2 (int)
col 3 (list)
name1
num
[text(01),text(02),...,text(n)]
name2
num
[text(11),text(12),...,text(m)]
Where one of the columns is a list of strings, in this case col 3, and n!=m.
What I would like to know is if there is a way to display them in a more readable manner, such as:
col 1 (str)
col 2 (int)
col 3 (list)
name1
num
text(01)
...
text(n)
name2
num
text(11)
...
text(m)
I appreciate this looks messy but my intention is for all the texts to be displayed in one cell, just with line breaks, rather than being split across multiple rows as the table above shows.
Thank you in advance.
There is explode - function in pandas - It's partially solving the problem - but in this case [col1] + [col2] will be duplicated.
The explode() function is used to transform each element of a list-like to a row, replicating the index values.
df1 = df1.explode('col3name')
df1.explode('col3name')
Initial:
After explode:
Use explode on the col3 with list values like
In [44]: df.explode('col3')
Out[44]:
col1 col2 col3
0 name1 num text(01)
0 name1 num text(02)
1 name2 num text(11)
1 name2 num text(12)
Could then set_index
In [53]: df.explode('col3').set_index(['col1', 'col2'])
Out[53]:
col3
col1 col2
name1 num text(01)
num text(02)
name2 num text(11)
num text(12)

Create a column out of the 2nd portion of text of two columns in pandas

I have a dataframe with two columns. I want to create a third column that is the
"sum" of the first two columns, but without the first bit of each column. I think this is best shown in an example:
col1 col2 col3 (need to make)
abc_what_I_want1 abc_what_I_want1 what_I_want1what_I_want1
psdb_what_I_want2 what_I_want2
vxc_what_I_want3 vxc_what_I_want3 what_I_want3what_I_want3
qk_what_I_want4 qk_what_I_want4 what_I_want4what_I_want4
ertsa_what_I_want5 what_I_want5
abc_what_I_want6 abc_what_I_want6 what_I_want6what_I_want6
Note that what_I_want# will be different for every row, but the same between columns in the same row. The prefix will always be the same for each row but can differ/repeat between rows. Cells shown as blank are "" strings.
The code I have so far:
df["col3"] = df["col1"].str.split("_", 1) + df["col2"].str.split("_", 1)
From there I wanted just the 2nd (or last) element of the split so I tried both of the following:
df["col3"] = df["col1"].str.split("_", 1)[1] + df["col2"].str.split("_", 1)[1]
df["col3"] = df["col1"].str.split("_", 1)[-1] + df["col2"].str.split("_", 1)[-1]
Both of these returned errors. The first error I think is because of replicated values (ValueError: cannot reindex from a duplicate axis). The second is a Keyvalue Error.
You were actually quite close, just needed to select the correct slice with str[1] and meanwhile fillna for the empty cells:
m = df['col1'].str.split('_', 1).str[1].fillna('') + df['col2'].str.split('_', 1).str[1].fillna('')
df['col3'] = m
col1 col2 col3
0 abc_what_I_want1 abc_what_I_want1 what_I_want1what_I_want1
1 psdb_what_I_want2 what_I_want2
2 vxc_what_I_want3 vxc_what_I_want3 what_I_want3what_I_want3
3 qk_what_I_want4 qk_what_I_want4 what_I_want4what_I_want4
4 ertsa_what_I_want5 what_I_want5
5 abc_what_I_want6 abc_what_I_want6 what_I_want6what_I_want6
Another method would be to use apply where you can apply split on multiple columns at once:
m = df[['col1', 'col2']].apply(lambda x: x.str.split('_', 1).str[1]).fillna('')
df['col3'] = m['col1']+m['col2']
col1 col2 col3
0 abc_what_I_want1 abc_what_I_want1 what_I_want1what_I_want1
1 psdb_what_I_want2 what_I_want2
2 vxc_what_I_want3 vxc_what_I_want3 what_I_want3what_I_want3
3 qk_what_I_want4 qk_what_I_want4 what_I_want4what_I_want4
4 ertsa_what_I_want5 what_I_want5
5 abc_what_I_want6 abc_what_I_want6 what_I_want6what_I_want6
You can replace() all char up until the first underscore and then apply() a join() or sum() on axis=1:
df['Col3']=df.replace('^[^_]*_','',regex=True).fillna('').apply(''.join,axis=1)
Or:
df['Col3']=df.replace('^[^_]*_','',regex=True).fillna('').sum(axis=1)
Or:
df['Col3']=(pd.Series(df.replace('^[^_]*_','',regex=True).fillna('').values.tolist())
.str.join(''))
col1 col2 Col3
0 abc_what_I_want1 abc_what_I_want1 what_I_want1what_I_want1
1 psdb_what_I_want2 what_I_want2 what_I_want2I_want2
2 vxc_what_I_want3 vxc_what_I_want3 what_I_want3what_I_want3
3 qk_what_I_want4 qk_what_I_want4 what_I_want4what_I_want4
4 NaN ertsa_what_I_want5 what_I_want5
5 abc_what_I_want6 abc_what_I_want6 what_I_want6what_I_want6

Python w/ Excel sheet. For each value in First Col. calculate two columns.

I do have an excel sheet that has for example 100 rows in Col A the value changes after every five rows, so First 5 rows have value 1, Second 5 rows has value 2,.. etc until I reach the last 5 rows that have value 20. I want to go through Col A values and computes Col B with Col C. in ColA I will have 20 values.
So, a traditional way to do that is:
n= 5
lst=[row for row in range(1, 100//n) for _ in range(n)]
row=1
for row in lst:
value = sheet.cell(row, 0).value
if value==1:
''' My code here'''
elif value == 2:
''' My code here'''
::
::
::
::
::
::
::
::
# I do this till I reach value == 20
I don't want to write more lots lines of code to go through all the values in Col A. All these lines doing the same job except they change the value in column A after every 5 rows.
My question here how can I make a loop or any other way to go through col A. Please if you want me to elaborate I'm happy to do so.
The output should be like this as an example:
For value # 1 in Col A. Total is: 50 (sum numbers in Col B with Col C)
For value # 2 in Col A. Total is: 86 (sum numbers in Col B with Col C)
For value # 3 in Col A. Total is: 99 (sum numbers in Col B with Col C)
...
Until Reach
For value # 20 in Col A. Total is: 4235 (sum numbers in Col B with Col C)
Note: How Can I make a loop in values in Col A. calculate values in Col B with Col C that corresponding to value in Col A.
Not entirely sure if this is what you're asking, but enumerating the list will give you an index, a number that when divided by 5 gives you the value you seem to be looking for in column A.
for index, row in enumerate(lst):
value = sheet.cell(row, 0).value
columnA = int(float(index)/5)+1
No need to divide the list into 5 at the end.

Categories

Resources