Python Pandas VLOOKUP function with categorical and non-numeric values - python
I want to optimize a process of a "vlookup" in Python that works but is not scalable in its current form. I have tried pythons pivot.table and pivot but it's been limited due to alphanumeric and string values in cells. I have two tables:
table1:
ProductID
Sales
123456
34
abc123
34
123def
34
a1234f
34
1abcd6
34
table2:
Brand
Site1
Site2
Site3
Brand1
123456
N/A
N/A
Brand2
N/A
abc123
N/A
Brand1
N/A
N/A
123def
Brand2
N/A
1abcd6
N/A
Brand1
a1234f
N/A
N/A
What I originally wanted to see was sales by brand:
Brand
Sales
Brand1
102
Brand2
68
Here's the pseudocode I've basically built out in Python and Pandas:
# read sales and product tables into pandas
sales_df = pd.read_csv(table1)
product_df = pd.read_csv(table2)
# isolate each product id column into separate dfs
product_site1_df = product_df.drop(['Site2', 'Site3'],axis=1)
product_site2_df = product_df.drop(['Site1', 'Site3'],axis=1)
product_site3_df = product_df.drop(['Site1', 'Site2'],axis=1)
# rename and append all product ids into a single column
product_site1_df.rename(columns={"Site1": "ProductID"})
product_site2_df.rename(columns={"Site2": "ProductID"})
product_site3_df.rename(columns={"Site3": "ProductID"})
product_list_master_df = pd.concat([product_site1_df, product_site2_df, product_site3_df])
#compare sales df and product df, pulling brand in as a new column to the sales table
inner_join = pd.merge(sales_df,
product_df,
on ='ProductID',
how ='inner')
This is obviously very procedural, not scalable, computationally redundant, and seems very round-about to get to what I want. Additionally, I'm losing data such as if I want to do a pivot based on sites rather than sales. Short of changing the data model itself, what can I do here to improve speed, versatility, and lines of code?
Assuming the dataframes are named df1 and df2, you can reshape and map to perform the VLOOKUP, then groupby+sum:
(df2.set_index('Brand')
.stack()
.map(df1.set_index('ProductID')['Sales'])
.groupby(level='Brand').sum()
)
Output:
Brand
Brand1 102
Brand2 68
Here's how you can do it, without Pandas, just using Python's standard CSV lib, and a Counter (for your sales-by-brand):
import csv
from collections import Counter
# Create a product/sales lookup
sales_by_product = {}
with open('sales.csv', newline='') as f:
reader = csv.reader(f)
next(reader) # discard header
for row in reader:
p_id, sales = row
sales_by_product[p_id] = int(sales)
sales_by_brand_counter = Counter()
with open('products.csv', newline='') as f:
reader = csv.reader(f)
next(reader) # discard header
for row in reader:
brand_id = row[0]
for p_id in row[1:]:
sales = sales_by_product.get(p_id, 0)
sales_by_brand_counter[brand_id] += sales
with open('sales_by_brand.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['Brand', 'Sales'])
rows = [[elem, cnt] for (elem, cnt) in sales_by_brand_counter.items()]
writer.writerows(rows)
When I run that with sales.csv:
ProductID,Sales
123456,34
abc123,34
123def,34
a1234f,34
1abcd6,34
and products.csv:
Brand,Site1,Site2,Site3
Brand1,123456,N/A,N/A
Brand2,N/A,abc123,N/A
Brand1,N/A,N/A,123def
Brand2,N/A,1abcd6,N/A
Brand1,a1234f,N/A,N/A
I get sales_by_brand.csv:
Brand,Sales
Brand1,102
Brand2,68
The work that really matters, finding product IDs and summing sales is handled here:
for row in reader:
brand_id = row[0]
for p_id in row[1:]:
sales = sales_by_product.get(p_id, 0)
sales_by_brand_counter[brand_id] += sales
It can read through as many Site columns as there are. If the site contains 'N/A' or a product ID that isn't in the lookup dict, it just adds 0 to that brand.
Related
How to group data by count of columns in Pandas?
I have a CSV file with a lot of rows and different number of columns. How to group data by count of columns and show it in different frames? File CSV has the following data: 1 OLEG US FRANCE BIG 1 OLEG FR 18 1 NATA 18 Because I have different number of colums in each row I have to group rows by count of columns and show 3 frames to be able set header then: ID NAME STATE COUNTRY HOBBY FR1: 1 OLEG US FRANCE BIG ID NAME COUNTRY AGE FR2: 1 OLEG FR 18 FR3: ID NAME AGE 1 NATA 18 Any words, I need to group rows by count of columns and show them in different dataframes.
since pandas doesn't allow you to have different length of columns, just don't use it to import your data. Your goal is to create three seperate df, so first import the data as lists, and then deal with it and its differents lengths. One way to solve this is read the data with csv.reader and create the df's with list comprehension together with a condition for the length of the lists. with open('input.csv', 'r') as f: reader = csv.reader(f, delimiter=' ') data= list(reader) df1 = pd.DataFrame([item for item in data if len(item)==3], columns='ID NAME AGE'.split()) df2 = pd.DataFrame([item for item in data if len(item)==4], columns='ID NAME COUNTRY AGE'.split()) df3 = pd.DataFrame([item for item in data if len(item)==5], columns='ID NAME STATE COUNTRY HOBBY'.split()) print(df1, df2, df3, sep='\n\n') ID NAME AGE 0 1 NATA 18 ID NAME COUNTRY AGE 0 1 OLEG FR 18 ID NAME STATE COUNTRY HOBBY 0 1 OLEG US FRANCE BIG If you need to hardcode too many lines for the same step (e.g. too many df's), then you should consider using a loop to create them and store each dataframe as key/value in a dictionary. EDIT Here is the little optimizedway of creating those df's. I think you can't get around creating a list of columns you want to use for the seperate df's, so you need to know what variations of number of columns you have in your data (except you want to create those df's without naming the columns. col_list=[['ID', 'NAME', 'AGE'],['ID', 'NAME', 'COUNTRY', 'AGE'],['ID', 'NAME', 'STATE', 'COUNTRY', 'HOBBY']] with open('input.csv', 'r') as f: reader = csv.reader(f, delimiter=' ') data= list(reader) dict_of_dfs = {} for cols in col_list: dict_of_dfs[f'df_{len(cols)}'] = pd.DataFrame([item for item in data if len(item)==len(cols)], columns=cols) for key,val in dict_of_dfs.items(): print(f'{key=}: \n {val} \n') key='df_3': ID NAME AGE 0 1 NATA 18 key='df_4': ID NAME COUNTRY AGE 0 1 OLEG FR 18 key='df_5': ID NAME STATE COUNTRY HOBBY 0 1 OLEG US FRANCE BIG Now you don't have variables for your df, instead you have them in a dictionary as keys. (I named the df with the number of columns it has, df_3 is the df with three columns. If you need to import the data with pandas, you could have a look at this post.
My headers are in the first column of my txt file. I want to create a Pandas DF
Sample data from text file [User] employeeNo=123 last_name=Toole first_name=Michael language=english email = michael.toole#123.ie department=Marketing role=Marketing Lead [User] employeeNo=456 last_name= Ronaldo first_name=Juan language=Spanish email=juan.ronaldo#sms.ie department=Data Science role=Team Lead Location=Spain [User] employeeNo=998 last_name=Lee first_name=Damian language=english email=damian.lee#email.com [User] Wondering if someone could help me, you can see my sample dataset above. What I would like to do (please tell me if there is a more efficient way) is to loop through the first column and whereever the list of unique ids occur (e.g first_name, last_name, role etc) append the value in the corresponding row to that list and do this which each unique ID so that I'm left with the below. I have read about multi-indexing and I'm not sure if that might be a better solution but I couldn't get it to work (I'm quite new to python) enter image description here # Define a list of selected persons selectedList = textFile # Define a list of searching person searchList = ['uid'] # Define an empty list foundList = [] # Iterate each element from the selected list for index, sList in enumerate(textFile): # Match the element with the element of searchList if sList in searchList: # Store the value in foundList if the match is found foundList.append(selectedList[index])
You have a text file where each records starts with a [User] line and data lines have a key=value format. I know no module able to automatically handle that, but it is easy to parse it by hand. Code could be: with open('file.txt') as fd: data = [] # a list of records for line in fd: line = line.strip() # strip end of line if line == '[User]': # new record row = {} # row will be a key: value dict data.append(row) else: k,v = line.split('=', 1) # split on the = character row[k] = v df = pd.DataFrame(data) # list of key: value dicts => dataframe With the sample data shown, we get: employeeNo last_name first_name language email department role email Location 0 123 Toole Michael english michael.toole#123.ie Marketing Marketing Lead NaN NaN 1 456 Ronaldo Juan Spanish NaN Data Science Team Lead juan.ronaldo#sms.ie Spain 2 998 Lee Damian english NaN NaN NaN damian.lee#email.com NaN
I'm sure there is a more optimal way to do this, but it would be to get a unique list of row names, this time extracting them in a loop process and combining them into a new dataframe. Finally, update it with the desired column names. import pandas as pd import numpy as np import io data = ''' [User] employeeNo=123 last_name=Toole first_name=Michael language=english email=michael.toole#123.ie department=Marketing role="Marketing Lead" [User] employeeNo=456 last_name= Ronaldo first_name=Juan language=Spanish email=juan.ronaldo#sms.ie department="Data Science" role=Team Lead Location=Spain [User] employeeNo=998 last_name=Lee first_name=Damian language=english email=damian.lee#email.com [User] ''' df = pd.read_csv(io.StringIO(data), sep='=', comment='[', header=None) new_cols = df[0].unique() new_df = pd.DataFrame() for col in new_cols: tmp = df[df[0] == col] tmp.reset_index(inplace=True) new_df = pd.concat([new_df, tmp[1]], axis=1) new_df.columns = new_cols new_df['User'] = None new_df = new_df[['User','employeeNo','last_name','first_name','language','email','department','role','Location']] new_df User employeeNo last_name first_name language email department role Location 0 None 123 Toole Michael english michael.toole#123.ie Marketing Marketing Lead Spain 1 None 456 Ronaldo Juan Spanish juan.ronaldo#sms.ie Data Science Team Lead NaN 2 None 998 Lee Damian english damian.lee#email.com NaN NaN NaN
Rewrite based on testing of previous version offset values import pandas as pd # Revised from previous answer - ensures key value pairs are contained to the same # record - previous version assumed the first record had all the expected keys - # inadvertently assigned (Location) value of second record to the first record # which did not have a Location key # This version should perform better - only dealing with one single df # - and using pandas own pivot() function textFile = 'file.txt' filter = '[User]' # Decoration - enabling a check and balance - how many users are we processing? textFileOpened = open(textFile,'r') initialRead = textFileOpened.read() userCount = initialRead.count(filter) # sample has 4 [User] entries - but only three actual unique records print ('User Count {}'.format(userCount)) # Create sets so able to manipulate and interrogate allData = [] oneRow = [] userSeq = 0 #Iterate through file - assign record key and [userSeq] Key to each pair with open(textFile, 'r') as fp: for fileLineSeq, line in enumerate(fp): if filter in str(line): userSeq = userSeq + 1 # Ensures each key value pair is grouped else: userSeq = userSeq oneRow = [fileLineSeq, userSeq, line] allData.append(oneRow) df = pd.DataFrame(allData) df.columns = ['FileRow','UserSeq','KeyValue'] # rename columns userSeparators = df[df['KeyValue'] == str(filter+'\n') ].index # Locate [User Records] df.drop(userSeparators, inplace = True) # Remove [User] records df = df.replace(' = ' , '=' , regex=True ) # Input data dirty - cleaning up df = df.replace('\n' , '' , regex=True ) # remove the new lines appended during the list generation # print(df) # Test as necessary here # split KeyValue column into two df[['Key', 'Value']] = df.KeyValue.str.split('=', expand=True) # very powerful function - convert to table df = df.pivot(index='UserSeq', columns='Key', values='Value') print(df) Results User Count 4 Key Location department email employeeNo first_name language last_name role UserSeq 1 NaN Marketing michael.toole#123.ie 123 Michael english Toole Marketing Lead 2 Spain Data Science juan.ronaldo#sms.ie 456 Juan Spanish Ronaldo Team Lead 3 NaN NaN damian.lee#email.com 998 Damian english Lee NaN
CSV File Transpose Column to Row in Python
Ive been wrecking my head with this and I probably just need to step back. I have a CSV file like this : ( dummy data - there could be 1-20 Parameters ) CAR,NAME,AGE,COLOUR Ford,Mike,45,Blue VW,Peter,67,Yellow And need CAR,PARAMETER,VALUE Ford,NAME,Mike Ford,AGE,45 Ford,COLOUR,BLUE VW,NAME,Peter VW,AGE,67 VW,COLOUR,Yellow Im Looking at : How to transpose a dataset in a csv file? How to transpose a dataset in a csv file? Python writing a .csv file with rows and columns transpose But i think because I want to keep CAR column static , the Python zip function might not hack it.. Any thoughts on this Sunny Friday Gurus? Regards! <Python - Transpose columns to rows within data operation and before writing to file >>
Use pandas: df_in = read_csv('infile.csv') df_out = df_in.set_index('CAR').stack().reset_index() df_out.columns = ['CAR', 'PARAMETER', 'VALUE'] df_out.to_csv('outfile.csv', index=False) Input and output example: >>> df_in CAR NAME AGE COLOUR 0 Ford Mike 45 Blue 1 VW Peter 67 Yellow >>> df_out CAR PARAMETER VALUE 0 Ford NAME Mike 1 Ford AGE 45 2 Ford COLOUR Blue 3 VW NAME Peter 4 VW AGE 67 5 VW COLOUR Yellow
I was able to use Python - Transpose columns to rows within data operation and before writing to file with some tweaks and all is working now well. import csv with open('transposed.csv', 'wt') as destfile: writer = csv.writer(destfile) writer.writerow(['car', 'parameter', 'value']) with open('input.csv', 'rt') as sourcefile: for d in csv.DictReader(sourcefile): car= d.pop('car') for parameter, value in sorted(d.items()): row = [car, parameter.upper(), value] writer.writerow(row)
CSV filtering and ascending order
New to Python, so I need a bit of help. I have a CSV file that has an id, created_at date, first/last name columns. id created_at first_name last_name 1 1309380645 Cecelia Holt 2 1237178109 Emma Allison 3 1303585711 Desiree King 4 1231175716 Sam Davidson I want to filter the rows between two dates lets say 03-22-2016 and 04-15-2016(dates don't really matter), and then order those rows in ascending order (by created_at) I know this code will just show all or most of the data import csv from datetime import datetime with open("sample_data.csv") as f: reader = csv.reader(f) for row in reader: print(" ".join(row[])) But I'm not sure how to do the rest, or how to filter using this timestamp 1309380645 would using pandas be more beneficial for me, over using csv? Any help is much appreciated or a guide/book to read for more understanding.
I recommend using pandas since it will help you filter and perform further analysis faster. # import pandas and datetime import pandas as pd import datetime # read csv file df = pd.read_csv("sample_data.csv") # convert created_at from unix time to datetime df['created_at'] = pd.to_datetime(df['created_at'], unit='s') # contents of df at this point # id created_at first_name last_name # 0 1 2011-06-29 20:50:45 Cecelia Holt # 1 2 2009-03-16 04:35:09 Emma Allison # 2 3 2011-04-23 19:08:31 Desiree King # 3 4 2009-01-05 17:15:16 Sam Davidson # filtering example df_filtered = df[(df['created_at'] <= datetime.date(2011,3,22))] # output of df_filtered # id created_at first_name last_name # 1 2 2009-03-16 04:35:09 Emma Allison # 3 4 2009-01-05 17:15:16 Sam Davidson # filter based on dates mentioned in the question df_filtered = df[(df['created_at'] >= datetime.date(2016,3,22)) & (df['created_at'] <= datetime.date(2016,4,15))] # output of df_filtered would be empty at this point since the # dates are out of this range # sort df_sorted = df_filtered.sort_values(['created_at']) Explanation of filtering in pandas: First thing that you need to know is that using a comparison operator on a dataframe returns a dataframe with boolean values. df['id'] > 2 Would return False False True True Now, pandas supports logical indexing. So if you pass a dataframe with boolean values to pandas, if will return only the ones that correspond to True. df[df['id'] > 2] Returns 3 1303585711 Desiree King 4 1231175716 Sam Davidson This is how you can filter easily in pandas
Downloading and installing (and learning) pandas just to do this seems like overkill. Here's how to do it using only Python's built-in modules: import csv from datetime import datetime, date import sys start_date = date(2011, 1, 1) end_date = date(2011, 12, 31) # Read csv data into memory filtering rows by the date in column 2 (row[1]). csv_data = [] with open("sample_data.csv", newline='') as f: reader = csv.reader(f, delimiter='\t') header = next(reader) csv_data.append(header) for row in reader: creation_date = date.fromtimestamp(int(row[1])) if start_date <= creation_date <= end_date: csv_data.append(row) if csv_data: # Anything found? # Print the results in ascending date order. print(" ".join(csv_data[0])) # Converting the timestamp to int may not be necessary (but doesn't hurt) for row in sorted(csv_data[1:], key=lambda r: int(r[1])): print(" ".join(row))
How to convert series to dataframe in Pandas
I have two CSVs I need to compare them based on one column. And I need to put matched rows in one csv and unmatched rows in other. So, I created index on that column in second csv and looped through first. df1 = pd.read_csv(file1,nrows=100) df2 = pd.read_csv(file2,nrows=100) df2.set_index('crc', inplace = True) matched_list = [] non_matched_list = [] for _, row in df1.iterrows(): try: x = df2.loc[row['crc']] matched_list.append(x) except KeyError: non_matched_list.append(row) The x here is a series in the following format policyID 448094 statecode FL county CLAY COUNTY eq_site_limit 1322376.3 hu_site_limit 1322376.3 fl_site_limit 1322376.3 fr_site_limit 1322376.3 tiv_2011 1322376.3 tiv_2012 1438163.57 eq_site_deductible 0 hu_site_deductible 0.0 fl_site_deductible 0 fr_site_deductible 0 point_latitude 30.063936 point_longitude -81.707664 line Residential construction Masonry point_granularity 3 Name: 448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,0,0.0, dtype: object My output csv should be in following format policyID,statecode,county,eq_site_limit,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity 114455,FL,CLAY COUNTY,498960,498960,498960,498960,498960,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1 For all the series in the matched and unmatched. How do I do it? I can not get rid off index in second csv as performance in important. Following are the content of two csv files. File1: policyID,statecode,county,crc,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity 114455,FL,CLAY COUNTY,589658,498960,498960,498960,498960,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1 448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,1438163.57,0,0,0,0,30.063936,-81.707664,Residential,Masonry,3 206893,FL,CLAY COUNTY,745689.4,190724.4,190724.4,190724.4,190724.4,192476.78,0,0,0,0,30.089579,-81.700455,Residential,Wood,1 333743,FL,CLAY COUNTY,0,12563.76,0,0,79520.76,86854.48,0,0,0,0,30.063236,-81.707703,Residential,Wood,3 172534,FL,CLAY COUNTY,0,254281.5,0,254281.5,254281.5,246144.49,0,0,0,0,30.060614,-81.702675,Residential,Wood,1 785275,FL,CLAY COUNTY,0,515035.62,0,0,515035.62,884419.17,0,0,0,0,30.063236,-81.707703,Residential,Masonry,3 995932,FL,CLAY COUNTY,0,19260000,0,0,19260000,20610000,0,0,0,0,30.102226,-81.713882,Commercial,Reinforced Concrete,1 223488,FL,CLAY COUNTY,328500,328500,328500,328500,328500,348374.25,0,16425,0,0,30.102217,-81.707146,Residential,Wood,1 433512,FL,CLAY COUNTY,315000,315000,315000,315000,315000,265821.57,0,15750,0,0,30.118774,-81.704613,Residential,Wood,1 142071,FL,CLAY COUNTY,705600,705600,705600,705600,705600,1010842.56,14112,35280,0,0,30.100628,-81.703751,Residential,Masonry,1 File2: policyID,statecode,county,crc,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity 119736,FL,CLAY COUNTY,498960,498960,498960,498960,498960,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1 448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,1438163.57,0,0,0,0,30.063936,-81.707664,Residential,Masonry,3 206893,FL,CLAY COUNTY,190724.4,190724.4,190724.4,190724.4,190724.4,192476.78,0,0,0,0,30.089579,-81.700455,Residential,Wood,1 333743,FL,CLAY COUNTY,0,79520.76,0,0,79520.76,86854.48,0,0,0,0,30.063236,-81.707703,Residential,Wood,3 172534,FL,CLAY COUNTY,0,254281.5,0,254281.5,254281.5,246144.49,0,0,0,0,30.060614,-81.702675,Residential,Wood,1 785275,FL,CLAY COUNTY,0,51564.9,0,0,515035.62,884419.17,0,0,0,0,30.063236,-81.707703,Residential,Masonry,3 995932,FL,CLAY COUNTY,0,457962,0,0,19260000,20610000,0,0,0,0,30.102226,-81.713882,Commercial,Reinforced Concrete,1 223488,FL,CLAY COUNTY,328500,328500,328500,328500,328500,348374.25,0,16425,0,0,30.102217,-81.707146,Residential,Wood,1 433512,FL,CLAY COUNTY,315000,315000,315000,315000,315000,265821.57,0,15750,0,0,30.118774,-81.704613,Residential,Wood,1 142071,FL,CLAY COUNTY,705600,705600,705600,705600,705600,1010842.56,14112,35280,0,0,30.100628,-81.703751,Residential,Masonry,1 253816,FL,CLAY COUNTY,831498.3,831498.3,831498.3,831498.3,831498.3,1117791.48,0,0,0,0,30.10216,-81.719444,Residential,Masonry,1 894922,FL,CLAY COUNTY,0,24059.09,0,0,24059.09,33952.19,0,0,0,0,30.095957,-81.695099,Residential,Wood,1 Edit: Added sample csv
I think you can do it this way: df1.loc[df1.crc.isin(df2.index)].to_csv('/path/to/matched.csv', index=False) df1.loc[~df1.crc.isin(df2.index)].to_csv('/path/to/unmatched.csv', index=False) instead of looping... Demo: In [62]: df1.loc[df1.crc.isin(df2.index)].to_csv(r'c:/temp/matched.csv', index=False) In [63]: df1.loc[~df1.crc.isin(df2.index)].to_csv(r'c:/temp/unmatched.csv', index=False) Results: matched.csv: policyID,statecode,county,crc,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity 448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,1438163.57,0,0.0,0,0,30.063935999999998,-81.70766400000001,Residential,Masonry,3 333743,FL,CLAY COUNTY,0.0,12563.76,0.0,0.0,79520.76,86854.48,0,0.0,0,0,30.063236,-81.70770300000001,Residential,Wood,3 172534,FL,CLAY COUNTY,0.0,254281.5,0.0,254281.5,254281.5,246144.49,0,0.0,0,0,30.060614,-81.702675,Residential,Wood,1 785275,FL,CLAY COUNTY,0.0,515035.62,0.0,0.0,515035.62,884419.17,0,0.0,0,0,30.063236,-81.70770300000001,Residential,Masonry,3 995932,FL,CLAY COUNTY,0.0,19260000.0,0.0,0.0,19260000.0,20610000.0,0,0.0,0,0,30.102226,-81.713882,Commercial,Reinforced Concrete,1 223488,FL,CLAY COUNTY,328500.0,328500.0,328500.0,328500.0,328500.0,348374.25,0,16425.0,0,0,30.102217,-81.707146,Residential,Wood,1 433512,FL,CLAY COUNTY,315000.0,315000.0,315000.0,315000.0,315000.0,265821.57,0,15750.0,0,0,30.118774,-81.704613,Residential,Wood,1 142071,FL,CLAY COUNTY,705600.0,705600.0,705600.0,705600.0,705600.0,1010842.56,14112,35280.0,0,0,30.100628000000004,-81.703751,Residential,Masonry,1 unmatched.csv: policyID,statecode,county,crc,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity 114455,FL,CLAY COUNTY,589658.0,498960.0,498960.0,498960.0,498960.0,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1 206893,FL,CLAY COUNTY,745689.4,190724.4,190724.4,190724.4,190724.4,192476.78,0,0.0,0,0,30.089578999999997,-81.700455,Residential,Wood,1