Deleting row in worksheet - python

I'm trying to delete a whole row from a worksheet with an index. I do this because i'm trying to do the 3sigma clipping method. Here is my code:
import openpyxl
from statistics import mean, stdev
wb=openpyxl.load_workbook('try1.xlsx')
sheet=wb.get_sheet_by_name('Blad1')
v = []
for i in range(1,555):
v.append(sheet['T'][i].value)
m = mean(v)
s = 3* stdev(v)
# clipping in the list BUT i dont know how to delete a row from a worksheet
for i in range(0,len(v)-1):
#if the value is more then 3 sigma (=s) away from the mean, i want to delete the whole row
# of information
if v[i] >= m+ s or v[i] <= m-s:
# tryin to delete row 'Ai':'Zi'
sheet.delete(sheet[['A'][i]:['Z'][i]])

To delete a row, use this: sheet.delete_rows(index, length).
Basically, index is a number which will represent the start of the row that will be deleted. Then, length is also a number which represents the amount of columns that will be deleted from index. For example, sheet.delete_rows(1, 26) will delete all the rows, from A to Z.
If you wanted to empty a cell, you can do wb['A1'] = "".
You can also use move to shift the cells, which will overwrite any cells. To do this, try this:
sheet.move_range("A1:Z1", rows=-1, cols=2)
This will move the cells in the range A1:Z1 up one row, and right two columns. You can change the values of rows and cols to whatever you need.

Related

python counting rows with missing values doesn't work

i do'nt know why but the code to calculate rows with missing values doesn't work.
Can somebody please hlep?
excel file showing data
code in IDE
in excel, the rows that have missing values were 156 in total but i can't get this in python
using the code below
(kidney_df.isna().sum(axis=1) > 0).sum()
count=0
for i in kidney_df.isnull().sum(axis=1):
if i>0:
count=count+1
kidney_df.isna().sum().sum()
kidney_df is a whole dataframe, do you want to count each empty cell or just the empty cells in one column? Based on the formula in your image, it seems your are interested only in column 'Z'. You can specify that by using .iloc[] (index location) or by specifying the column name (not visible in your imgage) like so:
kidney_df.iloc[:, 26].isnull().sum()
Explaination:
.iloc[] # index location
: # meaning -> from row 0 to last row or '0:-1' which can be shortened to ':'
26 # which is the column index of column 'Z' in excel

How to find position of a row which has the maximum occupied columns in excel using python

My excel file consists of garbage value present above my header row, I want to find out the index position of row which has the maximum number of occupied column and then assign it as the header row
Without more information about what you mean as header, I'll assume it is the first row.
So this will iterate through the rows, find the first row with the largest amount of data (counted as fewest of empty cells from the end), and delete all rows above this row.
The data I work with in CSV format:
some,,,
useless,01/01/70,,
0,garbage,,
according,,to author,
head1,head2,head3,head4
dat1,dat2,dat3,dat4
dat2,dat3,dat4,dat5
dat3,dat4,dat5,dat6
from openpyxl import load_workbook
wb = load_workbook("./garbage.xlsx")
ws = wb.active
data_len = list[int]() # list for data length
for row in ws.iter_rows(values_only=True): # iterate through rows
# print(len(row)) # always the same since this is a table
for i in reversed(range(len(row))): # iterate through cells from end to beginning
if row[i] is not None: # if cell is not empty
data_len.append(i) # save cell position in data length list
break # move to next row
header_index = data_len.index(max(data_len)) # get first index of max data length
ws.delete_rows(0, header_index) # delete from top to header index
for row in ws.iter_rows(values_only=True): # only header and data remain
print(row)
NB. it will work the criterion for data length is correct, but would definitely not use it in production.

Getting modified data from an original data set, into a new 2D list, in the right rows and columns

Totally newbie here, but really want to learn. I am in a class where we are learning to use basics, like nested for-statements, so no pandas. And for this assignment, to hand in, we cannot import anything, only csv module to work with csv files. We cannot use max() nor min() nor sort() nor sorted().
So there is a big csv file, with countries and populations and years. Countries down the left column, years along the first row, and populations. Like 20 countries, 18 columns of years.
I have imported that csv file into my IDE, to a 2D list. Then I applied calculations to it to have min population, max population and population change. This data i now want to save into a new 2D list. THIS IS WHERE I get stuck.
#my own min() and max() defs.
def maxValue(listValues):
maximum = 0
for counter in listValues:
if counter > maximum:
maximum = counter
return maximum
def minValue(listValues):
minimum = listValues[0]
for counter in listValues:
if counter < minimum:
minimum = counter
return minimum
#convert the data in the 2D list into integers.
#I start at the second row, since first row is years,
#and start that the second column since first column is countries
for row in range(1, len(dataset)):
for column in range(1,len(dataset[row])):
dataset[row][column] = int(dataset[row][column])
#I go through data set, make calculations, and then try
# to append them to a new >listAnalyzed, which would
#have only 4 columns, with country, maxpop, minpop, popchange.
listAnalyzed = []
for row in dataset[1:]:
for colum in row:
maxPop = maxValue(row[1:])
minPop = minValue(row[1:])
chgPop = ((row[18] - row[1]) / row[1]) * 100
listAnalyzed.append(row[0])
listAnalyzed.append(maxPop)
listAnalyzed.append(minPop)
listAnalyzed.append(chgPop)
print(listAnalyzed)
But when I print listAnalyzed, I just get all the data in a single 1D list. I tried different combos, sometimes getting 18 rows in a 2D list, but all the data in the first row only. The more I messed with the code, the worst it got. So i stopped before it just got me more confused.
what I need is a 2D list with 18 rows, each row with 4 columns (country, max, min,change). How do I do it?
Thank you! Really chugging here :(
Your solution would look something like the following:
listAnalyzed = []
for row in dataset[1:]:
for column in row:
maxPop = maxValue(row[1:])
minPop = minValue(row[1:])
chgPop = ((row[18] - row[1]) / row[1]) * 100
listAnalyzed.append([row[0], maxPop, minPop, chgPop])
print(listAnalysed)
You just needed to add the required fields in a list and then append that list to your listAnalysed

Is there a way to delete an entire row and shift the cells up in xlwings?

If I wanted to delete an entire row and shift the cells up is there a way to do that? Below is a snippet of my loop which is iterating through the column and clearing the contents of the cell if it doesn't match my parameters. Is there a way rather than clearing just the cell in column A I could delete the whole row and shift up?
for i in range(lastRow):
i = i + 1
if sheet.range('A' + str(i)).value != 'DLQ' or 'DLR':
xw.Range('A' + str(i)).clear()
continue
else:
continue
Use delete() and specify the rows number(s) you want to delete in range():
import xlwings as xw
wb = xw.Book(r"test.xlsx")
wb.sheets[0].range("2:2").delete()
This would delete row number 2.

iterrows() loop is only reading last value and only modifying first row

I have a dataframe test. My goal is to search in the column t1 for specific strings, and if it matches exactly a specific string, put that string in the next column over called t1_selected. Only thing is, I can't get iterrows() to go over the entire dataframe, and to report results in respective rows.
for index, row in test.iterrows():
if any(['ABCD_T1w_MPR_vNav_passive' in row['t1']]):
#x = ast.literal_eval(row['t1'])
test.loc[i, 't1_selected'] = str(['ABCD_T1w_MPR_vNav_passive'])
I am only trying to get ABCD_T1w_MPR_vNav_passive to be in the 4th row under the t1_selected, while all the other rows will have not found. The first entry in t1_selected is from the last row under t1 which I didn't include in the screenshot because the dataframe has over 200 rows.
I tried to initialize an empty list to append output of
import ast
x = ast.literal_eval(row['t1'])
to see if I can put x in there, but the same issue occurred.
Is there anything I am missing?
for index, row in test.iterrows():
if any(['ABCD_T1w_MPR_vNav_passive' in row['t1']]):
#x = ast.literal_eval(row['t1'])
test.loc[index, 't1_selected'] = str(['ABCD_T1w_MPR_vNav_passive'])
Where index is the row its written to. With i it was not changing

Categories

Resources