Reading a .xlsx and accessing cell values but not by their position - python

this is my first question so sorry in advance if I make some explanation mistakes.
I'm coding in python 2.7.
I wrote a .xlsx (Excel) file (it could have been a .xls, I don't really need the macro + VBA at this point). The Excel file looks like this:
The values are linked with the name of the column and the name of the line. For example, I have a column named "Curve 1" and a line named "Number of extremum". So in that cell I wrote "1" if the curve1 has 1 extremum.
I want to take this value in order to manipule it in a python script.
I know I can use xlrd module with open workbook and put the values of the line 1 ("Number of extremum") in a list and then only take the first one (corresponding to the column "Curve 1" and so to the value "1" I want), but this isn't what I would like to have.
Instead, I would like to access the "1" cell value by only giving to the python script the strings "Curve 1" and "Number of extremum" and python would access to the cell at the meeting of the two and take its value : "1". Is it possible ?
I would like to do this because the Excel file would change in time and cells could be moved. So if I try to access cell value by it's "position number" (like line 1, column 1), I would have a problem if a column or a line is added at this position. I would like not to have to edit again the python script if there's some editing in the xlsx file.
Thank you very much.

Pandas is a popular 3rd party library for reading/writing datasets. You can use pd.DataFrame.at for efficient scalar access via row and column labels:
import pandas as pd
# read file
df = pd.read_excel('file.xlsx')
# extract value
val = df.at['N of extremum', 'Curve 1']

This is very easy using Pandas. To obtain the cell you want you can just use loc which allows you to specify the row and column just like you want.
import pandas
df = pandas.read_excel('test.xlsx')
df.loc['N of extremum', 'Curve 1']

Related

Is there any function by which we can access the specific value from specific cell (excel) from Python (Jupyter notebook)

I have data in the excel file. So I want to add a three columns in python and add values which are - ["exam ID" : EXAM-001
"Subject" : Maths
"Date" : 30-07-2022]
I have hundreds of excel file and I don't want to hard code. I just want that the function/the code which directly access the value from the cell address of the excel file and display the output in the python.
I have also mentioned the expected result below -
enter image description here
Thanks in Advance !!!
There are plenty of libraries.
I use openpyxl.
Here you should be able to open the workbook, find specific cells wither by ID or by value, can add new cells, append.
https://openpyxl.readthedocs.io/en/stable/
Examples : https://www.softwaretestinghelp.com/python-openpyxl-tutorial/

How can I check values of an excel sheet using Python?

I have an Excel sheet, with data of stock prices and I want to build a code where I check if the guess is correct or not. So I need to compare my python value to the value on the Excel sheet.
I have tried using repl.it in a .csv file, but it was not compatible and I was not able to check my values. I have also tried using a .xlsx file on repl.it, but I still could not access the values.
Is there any way I can compare the values?
Try this:
import pandas as pd
table = pd.read_excel('file_name.xslx')
for row in table.values.tolist():
first_value_in_row = row[0]
second_value_in_row = row[1]

How to fix a blank column being added at the far left when reading my excel file?

I'm working on a rather large excel file and as part of it I'd like to insert two columns into a new excel file at the far right which works, however whenever I do so an unnamed column appears at the far left with numbers in it.
This is for an excel file, and I've tried to use the .drop feature as well as use a new file and read about the CSV files but I cannot seem to apply it here, so nothing seems to solve it.
wdf = pd.read_excel(tLoc)
sheet_wdf_map = pd.read_excel(tLoc, sheet_name=None)
wdf['Adequate'] = np.nan
wdf['Explanation'] = np.nan
wdf = wdf.drop(" ", axis=1)
I expect the output to be my original columns with only the two new columns being on the far right without the unnamed column.
Add index_col=[0] as an argument to read_excel.

Python CSV formatting issue when writing specific columns to output file then opening in Excel

The Problem
I have a CSV file that contains a large number of items.
The first column can contain either an IP address or random garbage. The only other column I care about is the fourth one.
I have written the below snippet of code in an attempt to check if the first column is an IP address and, if so, write that and the contents of the fourth column to another CSV file side by side.
with open('results.csv','r') as csvresults:
filecontent = csv.reader(csvresults)
output = open('formatted_results.csv','w')
processedcontent = csv.writer(output)
for row in filecontent:
first = str(row[0])
fourth = str(row[3])
if re.match('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', first) != None:
processedcontent.writerow(["{},{}".format(first,fourth)])
else:
continue
output.close()
This works to an extent. However, when viewing in Excel, both items are placed in a single cell rather than two adjacent ones. If I open it in notepad I can see that each line is wrapped in quotation marks. If these are removed Excel will display the columns properly.
Example Input
1.2.3.4,rubbish1,rubbish2,reallyimportantdata
Desired Output
1.2.3.4 reallyimportantdata - two separate columns
Actual Output
"1.2.3.4,reallyimportantdata" - single column
The Question
Is there any way to fudge the format part to not write out with quotations? Alternatively, what would be the best way to achieve what I'm trying to do?
I've tried writing out to another file and stripping the lines but, despite not throwing any errors, the result was the same...
writerow() takes a list of elements and writes each of those into a column. Since you are feeding a list with only one element, it is being placed into one column.
Instead, feed writerow() a list:
processedcontent.writerow([first,fourth])
Have you considered using Pandas?
import pandas as pd
df = pd.read_csv("myFile.csv", header=0, low_memory=False, index_col=None)
fid = open("outputp.csv","w")
for index, row in df.iterrows():
aa=re.match(r"^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$",row['IP'])
if aa:
tline = '{0},{1}'.format(row['IP'], row['fourth column'])
fid.write(tline)
output.close()
There may be an error or two and I got the regex from here.
This assumes the first row of the csv has titles which can be referenced. If it does not then you can use header = None and reference the columns with iloc
Come to think of it you could probably run the regex on the dataFrame, copy the first and fourth column to a new dataFrame and use the to_csv method in pandas.

Find matching rows in a csv file using two column matching (Python)

I need to be able to find the matching row in a csv file, where value from one column named destination matches a variable from my python script, and value from another column named line matches another variable from the script, to retrieve another column (named code) value from the file.
How do I do that?
Thanks for your help.
Use pandas library in python
import pandas as pd
filename="filename.csv"
df=pd.read_csv(filename)
variabletomatch="whateveryouwanttomatch"
for idx,row in df.iterrows():
tomatch=row["columnname"]
if tomatch==variabletomatch:
print "MATCHED"

Categories

Resources