Find matching rows in a csv file using two column matching (Python) - python

I need to be able to find the matching row in a csv file, where value from one column named destination matches a variable from my python script, and value from another column named line matches another variable from the script, to retrieve another column (named code) value from the file.
How do I do that?
Thanks for your help.

Use pandas library in python
import pandas as pd
filename="filename.csv"
df=pd.read_csv(filename)
variabletomatch="whateveryouwanttomatch"
for idx,row in df.iterrows():
tomatch=row["columnname"]
if tomatch==variabletomatch:
print "MATCHED"

Related

How to extract data from a specific column in the first CSV file to another column in another CSV file?

I have two different CSV files which i have imported using pd.read_csv.
Both files have different header names. I would like to export this specific column under the header name of ["Model"] in the first CSV file to the second CSV file under the header name of ["Product"]
I have tried using the following code but produced value error:
writer=df1[df1['Model']==df2['Product']]
Would appreciate any help.
Try joining the DataFrames on the index using pandas.DataFrame.join then exporting the result as a csv using pandas.DataFrame.to_csv.
df1.join(df2)
df1.to_csv('./df2.csv')

Is there a way to import several .txt files each becoming a separate dataframe using pandas?

I have to work with 50+ .txt files each containing 2 columns and 631 rows where I have to do different operations to each (sometimes with each other) before doing data analysis. I was hoping there was a way to import each text file under a different dataframe in pandas instead of doing it individually. The code I've been using individually has been
df = pd.read_table(file_name, skiprows=1, index_col=0)
print(B)
I use index_col=0 because the first row is the x-value. I use skiprows=1 because I have to drop the title which is the first row (and file name in folder) of each .txt file. I was thinking maybe I could use glob package and importing all as a single data frame from the folder and then splitting it into different dataframes while keeping the first column as the name of each variable? Is there a feasible way to import all of these files at once under different dataframes from a folder and storing them under the first column name? All .txt files would be data frames of 2 col x 631 rows not including the first title row. All values in the columns are integers.
Thank you
Yes. If you store your file in a list named filelist (maybe using glob) you can use the following commands to read all files and store them on a dict.
dfdict = {f: pd.read_table(f,...) for f in filelist}
Then you can use each data frame with dfdict["filename.txt"].

Reading a .xlsx and accessing cell values but not by their position

this is my first question so sorry in advance if I make some explanation mistakes.
I'm coding in python 2.7.
I wrote a .xlsx (Excel) file (it could have been a .xls, I don't really need the macro + VBA at this point). The Excel file looks like this:
The values are linked with the name of the column and the name of the line. For example, I have a column named "Curve 1" and a line named "Number of extremum". So in that cell I wrote "1" if the curve1 has 1 extremum.
I want to take this value in order to manipule it in a python script.
I know I can use xlrd module with open workbook and put the values of the line 1 ("Number of extremum") in a list and then only take the first one (corresponding to the column "Curve 1" and so to the value "1" I want), but this isn't what I would like to have.
Instead, I would like to access the "1" cell value by only giving to the python script the strings "Curve 1" and "Number of extremum" and python would access to the cell at the meeting of the two and take its value : "1". Is it possible ?
I would like to do this because the Excel file would change in time and cells could be moved. So if I try to access cell value by it's "position number" (like line 1, column 1), I would have a problem if a column or a line is added at this position. I would like not to have to edit again the python script if there's some editing in the xlsx file.
Thank you very much.
Pandas is a popular 3rd party library for reading/writing datasets. You can use pd.DataFrame.at for efficient scalar access via row and column labels:
import pandas as pd
# read file
df = pd.read_excel('file.xlsx')
# extract value
val = df.at['N of extremum', 'Curve 1']
This is very easy using Pandas. To obtain the cell you want you can just use loc which allows you to specify the row and column just like you want.
import pandas
df = pandas.read_excel('test.xlsx')
df.loc['N of extremum', 'Curve 1']

Python: Import Excel Data and lookup values in Dictionary

Total beginner to python: Trying to import excel values from a column. Lookup the imported values in python dictionary (was able to create this) and then write the results into the excel file and see if they match to another column in the file.
You can use a module called pandas.
pip install pandas
To read the file use the following:
import pandas as pd
file = pd.ExcelFile('path/to/excelsheet/').parse('sheet_you_want_to_use') # 'Sheet 1' for Sheet 1
you can now access the columns using the column names as keys: file['column_name'].
You can now append the looked up values to a list. Then write to a excel file as follows:
list = ['....values....']
pd.DataFrame(list).to_excel('where/to/save/file')
I would advise you to read the following documentation:
pandas DataFrame
pandas ExcelFile
pandas to_excel
pandas

Reading from a specific row/column from and excel csv file

I am a beginner at Python and I'm looking to take 3 specific columns starting at a certain row from a .csv spreadsheet and then import each into python.
For example
I would need to take 1000 rows worth of data from column F starting at
row 12.
I've looked at options using cvs and pandas but I can't figure out how
to have them start importing at a certain row/column.
Any help would be greatly appreciated.
If the spreadsheet is not huge, the easiest approach is to load the entire CSV file into Python using the csv module and then extract the required rows and columns. For example:
import csv
rows = list(csv.reader(file('Book1.csv', 'rb')))
data = [column[5] for column in rows[11:11+1000]]
will do the trick. Remember that Python starts numbering from 0, so column[5] is column F from your spreadsheet and rows[11] is row 12.
CSV files being text files, there is no way to read a certain line. You will have to read line per line, and count... Have a look at the csv module in Python, which will explain how to (easily) read lines. Particularly this section.

Categories

Resources