Python Pandas VLOOKUP function with categorical and non-numeric values

Python Pandas VLOOKUP function with categorical and non-numeric values - python

I want to optimize a process of a "vlookup" in Python that works but is not scalable in its current form. I have tried pythons pivot.table and pivot but it's been limited due to alphanumeric and string values in cells. I have two tables:
table1:
ProductID
Sales
123456
34
abc123
34
123def
34
a1234f
34
1abcd6
34
table2:
Brand
Site1
Site2
Site3
Brand1
123456
N/A
N/A
Brand2
N/A
abc123
N/A
Brand1
N/A
N/A
123def
Brand2
N/A
1abcd6
N/A
Brand1
a1234f
N/A
N/A
What I originally wanted to see was sales by brand:
Brand
Sales
Brand1
102
Brand2
68
Here's the pseudocode I've basically built out in Python and Pandas:
# read sales and product tables into pandas
sales_df = pd.read_csv(table1)
product_df = pd.read_csv(table2)
# isolate each product id column into separate dfs
product_site1_df = product_df.drop(['Site2', 'Site3'],axis=1)
product_site2_df = product_df.drop(['Site1', 'Site3'],axis=1)
product_site3_df = product_df.drop(['Site1', 'Site2'],axis=1)
# rename and append all product ids into a single column
product_site1_df.rename(columns={"Site1": "ProductID"})
product_site2_df.rename(columns={"Site2": "ProductID"})
product_site3_df.rename(columns={"Site3": "ProductID"})
product_list_master_df = pd.concat([product_site1_df, product_site2_df, product_site3_df])
#compare sales df and product df, pulling brand in as a new column to the sales table
inner_join = pd.merge(sales_df,
product_df,
on ='ProductID',
how ='inner')
This is obviously very procedural, not scalable, computationally redundant, and seems very round-about to get to what I want. Additionally, I'm losing data such as if I want to do a pivot based on sites rather than sales. Short of changing the data model itself, what can I do here to improve speed, versatility, and lines of code?

Assuming the dataframes are named df1 and df2, you can reshape and map to perform the VLOOKUP, then groupby+sum:
(df2.set_index('Brand')
.stack()
.map(df1.set_index('ProductID')['Sales'])
.groupby(level='Brand').sum()
)
Output:
Brand
Brand1 102
Brand2 68

Here's how you can do it, without Pandas, just using Python's standard CSV lib, and a Counter (for your sales-by-brand):
import csv
from collections import Counter
# Create a product/sales lookup
sales_by_product = {}
with open('sales.csv', newline='') as f:
reader = csv.reader(f)
next(reader) # discard header
for row in reader:
p_id, sales = row
sales_by_product[p_id] = int(sales)
sales_by_brand_counter = Counter()
with open('products.csv', newline='') as f:
reader = csv.reader(f)
next(reader) # discard header
for row in reader:
brand_id = row[0]
for p_id in row[1:]:
sales = sales_by_product.get(p_id, 0)
sales_by_brand_counter[brand_id] += sales
with open('sales_by_brand.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['Brand', 'Sales'])
rows = [[elem, cnt] for (elem, cnt) in sales_by_brand_counter.items()]
writer.writerows(rows)
When I run that with sales.csv:
ProductID,Sales
123456,34
abc123,34
123def,34
a1234f,34
1abcd6,34
and products.csv:
Brand,Site1,Site2,Site3
Brand1,123456,N/A,N/A
Brand2,N/A,abc123,N/A
Brand1,N/A,N/A,123def
Brand2,N/A,1abcd6,N/A
Brand1,a1234f,N/A,N/A
I get sales_by_brand.csv:
Brand,Sales
Brand1,102
Brand2,68
The work that really matters, finding product IDs and summing sales is handled here:
for row in reader:
brand_id = row[0]
for p_id in row[1:]:
sales = sales_by_product.get(p_id, 0)
sales_by_brand_counter[brand_id] += sales
It can read through as many Site columns as there are. If the site contains 'N/A' or a product ID that isn't in the lookup dict, it just adds 0 to that brand.

Related

How to group data by count of columns in Pandas?

I have a CSV file with a lot of rows and different number of columns.
How to group data by count of columns and show it in different frames?
File CSV has the following data:
1 OLEG US FRANCE BIG
1 OLEG FR 18
1 NATA 18
Because I have different number of colums in each row I have to group rows by count of columns and show 3 frames to be able set header then:
ID NAME STATE COUNTRY HOBBY
FR1: 1 OLEG US FRANCE BIG
ID NAME COUNTRY AGE
FR2: 1 OLEG FR 18
FR3:
ID NAME AGE
1 NATA 18
Any words, I need to group rows by count of columns and show them in different dataframes.

since pandas doesn't allow you to have different length of columns, just don't use it to import your data. Your goal is to create three seperate df, so first import the data as lists, and then deal with it and its differents lengths.
One way to solve this is read the data with csv.reader and create the df's with list comprehension together with a condition for the length of the lists.
with open('input.csv', 'r') as f:
reader = csv.reader(f, delimiter=' ')
data= list(reader)
df1 = pd.DataFrame([item for item in data if len(item)==3], columns='ID NAME AGE'.split())
df2 = pd.DataFrame([item for item in data if len(item)==4], columns='ID NAME COUNTRY AGE'.split())
df3 = pd.DataFrame([item for item in data if len(item)==5], columns='ID NAME STATE COUNTRY HOBBY'.split())
print(df1, df2, df3, sep='\n\n')
ID NAME AGE
0 1 NATA 18
ID NAME COUNTRY AGE
0 1 OLEG FR 18
ID NAME STATE COUNTRY HOBBY
0 1 OLEG US FRANCE BIG
If you need to hardcode too many lines for the same step (e.g. too many df's), then you should consider using a loop to create them and store each dataframe as key/value in a dictionary.
EDIT
Here is the little optimizedway of creating those df's. I think you can't get around creating a list of columns you want to use for the seperate df's, so you need to know what variations of number of columns you have in your data (except you want to create those df's without naming the columns.
col_list=[['ID', 'NAME', 'AGE'],['ID', 'NAME', 'COUNTRY', 'AGE'],['ID', 'NAME', 'STATE', 'COUNTRY', 'HOBBY']]
with open('input.csv', 'r') as f:
reader = csv.reader(f, delimiter=' ')
data= list(reader)
dict_of_dfs = {}
for cols in col_list:
dict_of_dfs[f'df_{len(cols)}'] = pd.DataFrame([item for item in data if len(item)==len(cols)], columns=cols)
for key,val in dict_of_dfs.items():
print(f'{key=}: \n {val} \n')
key='df_3':
ID NAME AGE
0 1 NATA 18
key='df_4':
ID NAME COUNTRY AGE
0 1 OLEG FR 18
key='df_5':
ID NAME STATE COUNTRY HOBBY
0 1 OLEG US FRANCE BIG
Now you don't have variables for your df, instead you have them in a dictionary as keys. (I named the df with the number of columns it has, df_3 is the df with three columns.
If you need to import the data with pandas, you could have a look at this post.

My headers are in the first column of my txt file. I want to create a Pandas DF

Sample data from text file
[User]
employeeNo=123
last_name=Toole
first_name=Michael
language=english
email = michael.toole#123.ie
department=Marketing
role=Marketing Lead
[User]
employeeNo=456
last_name= Ronaldo
first_name=Juan
language=Spanish
email=juan.ronaldo#sms.ie
department=Data Science
role=Team Lead
Location=Spain
[User]
employeeNo=998
last_name=Lee
first_name=Damian
language=english
email=damian.lee#email.com
[User]
Wondering if someone could help me, you can see my sample dataset above. What I would like to do (please tell me if there is a more efficient way) is to loop through the first column and whereever the list of unique ids occur (e.g first_name, last_name, role etc) append the value in the corresponding row to that list and do this which each unique ID so that I'm left with the below.
I have read about multi-indexing and I'm not sure if that might be a better solution but I couldn't get it to work (I'm quite new to python)
enter image description here
# Define a list of selected persons
selectedList = textFile
# Define a list of searching person
searchList = ['uid']
# Define an empty list
foundList = []
# Iterate each element from the selected list
for index, sList in enumerate(textFile):
# Match the element with the element of searchList
if sList in searchList:
# Store the value in foundList if the match is found
foundList.append(selectedList[index])

You have a text file where each records starts with a [User] line and data lines have a key=value format. I know no module able to automatically handle that, but it is easy to parse it by hand. Code could be:
with open('file.txt') as fd:
data = [] # a list of records
for line in fd:
line = line.strip() # strip end of line
if line == '[User]': # new record
row = {} # row will be a key: value dict
data.append(row)
else:
k,v = line.split('=', 1) # split on the = character
row[k] = v
df = pd.DataFrame(data) # list of key: value dicts => dataframe
With the sample data shown, we get:
employeeNo last_name first_name language email department role email Location
0 123 Toole Michael english michael.toole#123.ie Marketing Marketing Lead NaN NaN
1 456 Ronaldo Juan Spanish NaN Data Science Team Lead juan.ronaldo#sms.ie Spain
2 998 Lee Damian english NaN NaN NaN damian.lee#email.com NaN

I'm sure there is a more optimal way to do this, but it would be to get a unique list of row names, this time extracting them in a loop process and combining them into a new dataframe. Finally, update it with the desired column names.
import pandas as pd
import numpy as np
import io
data = '''
[User]
employeeNo=123
last_name=Toole
first_name=Michael
language=english
email=michael.toole#123.ie
department=Marketing
role="Marketing Lead"
[User]
employeeNo=456
last_name= Ronaldo
first_name=Juan
language=Spanish
email=juan.ronaldo#sms.ie
department="Data Science"
role=Team Lead
Location=Spain
[User]
employeeNo=998
last_name=Lee
first_name=Damian
language=english
email=damian.lee#email.com
[User]
'''
df = pd.read_csv(io.StringIO(data), sep='=', comment='[', header=None)
new_cols = df[0].unique()
new_df = pd.DataFrame()
for col in new_cols:
tmp = df[df[0] == col]
tmp.reset_index(inplace=True)
new_df = pd.concat([new_df, tmp[1]], axis=1)
new_df.columns = new_cols
new_df['User'] = None
new_df = new_df[['User','employeeNo','last_name','first_name','language','email','department','role','Location']]
new_df
User employeeNo last_name first_name language email department role Location
0 None 123 Toole Michael english michael.toole#123.ie Marketing Marketing Lead Spain
1 None 456 Ronaldo Juan Spanish juan.ronaldo#sms.ie Data Science Team Lead NaN
2 None 998 Lee Damian english damian.lee#email.com NaN NaN NaN

Rewrite based on testing of previous version offset values
import pandas as pd
# Revised from previous answer - ensures key value pairs are contained to the same
# record - previous version assumed the first record had all the expected keys -
# inadvertently assigned (Location) value of second record to the first record
# which did not have a Location key
# This version should perform better - only dealing with one single df
# - and using pandas own pivot() function
textFile = 'file.txt'
filter = '[User]'
# Decoration - enabling a check and balance - how many users are we processing?
textFileOpened = open(textFile,'r')
initialRead = textFileOpened.read()
userCount = initialRead.count(filter) # sample has 4 [User] entries - but only three actual unique records
print ('User Count {}'.format(userCount))
# Create sets so able to manipulate and interrogate
allData = []
oneRow = []
userSeq = 0
#Iterate through file - assign record key and [userSeq] Key to each pair
with open(textFile, 'r') as fp:
for fileLineSeq, line in enumerate(fp):
if filter in str(line):
userSeq = userSeq + 1 # Ensures each key value pair is grouped
else: userSeq = userSeq
oneRow = [fileLineSeq, userSeq, line]
allData.append(oneRow)
df = pd.DataFrame(allData)
df.columns = ['FileRow','UserSeq','KeyValue'] # rename columns
userSeparators = df[df['KeyValue'] == str(filter+'\n') ].index # Locate [User Records]
df.drop(userSeparators, inplace = True) # Remove [User] records
df = df.replace(' = ' , '=' , regex=True ) # Input data dirty - cleaning up
df = df.replace('\n' , '' , regex=True ) # remove the new lines appended during the list generation
# print(df) # Test as necessary here
# split KeyValue column into two
df[['Key', 'Value']] = df.KeyValue.str.split('=', expand=True)
# very powerful function - convert to table
df = df.pivot(index='UserSeq', columns='Key', values='Value')
print(df)
Results
User Count 4
Key Location department email employeeNo first_name language last_name role
UserSeq
1 NaN Marketing michael.toole#123.ie 123 Michael english Toole Marketing Lead
2 Spain Data Science juan.ronaldo#sms.ie 456 Juan Spanish Ronaldo Team Lead
3 NaN NaN damian.lee#email.com 998 Damian english Lee NaN

CSV File Transpose Column to Row in Python

Ive been wrecking my head with this and I probably just need to step back.
I have a CSV file like this : ( dummy data - there could be 1-20 Parameters )
CAR,NAME,AGE,COLOUR
Ford,Mike,45,Blue
VW,Peter,67,Yellow
And need
CAR,PARAMETER,VALUE
Ford,NAME,Mike
Ford,AGE,45
Ford,COLOUR,BLUE
VW,NAME,Peter
VW,AGE,67
VW,COLOUR,Yellow
Im Looking at :
How to transpose a dataset in a csv file?
How to transpose a dataset in a csv file?
Python writing a .csv file with rows and columns transpose
But i think because I want to keep CAR column static , the Python zip function might not hack it..
Any thoughts on this Sunny Friday Gurus?
Regards!
<Python - Transpose columns to rows within data operation and before writing to file >>

Use pandas:
df_in = read_csv('infile.csv')
df_out = df_in.set_index('CAR').stack().reset_index()
df_out.columns = ['CAR', 'PARAMETER', 'VALUE']
df_out.to_csv('outfile.csv', index=False)
Input and output example:
>>> df_in
CAR NAME AGE COLOUR
0 Ford Mike 45 Blue
1 VW Peter 67 Yellow
>>> df_out
CAR PARAMETER VALUE
0 Ford NAME Mike
1 Ford AGE 45
2 Ford COLOUR Blue
3 VW NAME Peter
4 VW AGE 67
5 VW COLOUR Yellow

I was able to use Python - Transpose columns to rows within data operation and before writing to file with some tweaks and all is working now well.
import csv
with open('transposed.csv', 'wt') as destfile:
writer = csv.writer(destfile)
writer.writerow(['car', 'parameter', 'value'])
with open('input.csv', 'rt') as sourcefile:
for d in csv.DictReader(sourcefile):
car= d.pop('car')
for parameter, value in sorted(d.items()):
row = [car, parameter.upper(), value]
writer.writerow(row)

CSV filtering and ascending order

New to Python, so I need a bit of help.
I have a CSV file that has an id, created_at date, first/last name columns.
id created_at first_name last_name
1 1309380645 Cecelia Holt
2 1237178109 Emma Allison
3 1303585711 Desiree King
4 1231175716 Sam Davidson
I want to filter the rows between two dates lets say 03-22-2016 and 04-15-2016(dates don't really matter), and then order those rows in ascending order (by created_at)
I know this code will just show all or most of the data
import csv
from datetime import datetime
with open("sample_data.csv") as f:
reader = csv.reader(f)
for row in reader:
print(" ".join(row[]))
But I'm not sure how to do the rest, or how to filter using this timestamp 1309380645
would using pandas be more beneficial for me, over using csv?
Any help is much appreciated or a guide/book to read for more understanding.

I recommend using pandas since it will help you filter and perform further analysis faster.
# import pandas and datetime
import pandas as pd
import datetime
# read csv file
df = pd.read_csv("sample_data.csv")
# convert created_at from unix time to datetime
df['created_at'] = pd.to_datetime(df['created_at'], unit='s')
# contents of df at this point
# id created_at first_name last_name
# 0 1 2011-06-29 20:50:45 Cecelia Holt
# 1 2 2009-03-16 04:35:09 Emma Allison
# 2 3 2011-04-23 19:08:31 Desiree King
# 3 4 2009-01-05 17:15:16 Sam Davidson
# filtering example
df_filtered = df[(df['created_at'] <= datetime.date(2011,3,22))]
# output of df_filtered
# id created_at first_name last_name
# 1 2 2009-03-16 04:35:09 Emma Allison
# 3 4 2009-01-05 17:15:16 Sam Davidson
# filter based on dates mentioned in the question
df_filtered = df[(df['created_at'] >= datetime.date(2016,3,22)) & (df['created_at'] <= datetime.date(2016,4,15))]
# output of df_filtered would be empty at this point since the
# dates are out of this range
# sort
df_sorted = df_filtered.sort_values(['created_at'])
Explanation of filtering in pandas:
First thing that you need to know is that using a comparison operator on a dataframe returns a dataframe with boolean values.
df['id'] > 2
Would return
False
False
True
True
Now, pandas supports logical indexing. So if you pass a dataframe with boolean values to pandas, if will return only the ones that correspond to True.
df[df['id'] > 2]
Returns
3 1303585711 Desiree King
4 1231175716 Sam Davidson
This is how you can filter easily in pandas

Downloading and installing (and learning) pandas just to do this seems like overkill.
Here's how to do it using only Python's built-in modules:
import csv
from datetime import datetime, date
import sys
start_date = date(2011, 1, 1)
end_date = date(2011, 12, 31)
# Read csv data into memory filtering rows by the date in column 2 (row[1]).
csv_data = []
with open("sample_data.csv", newline='') as f:
reader = csv.reader(f, delimiter='\t')
header = next(reader)
csv_data.append(header)
for row in reader:
creation_date = date.fromtimestamp(int(row[1]))
if start_date <= creation_date <= end_date:
csv_data.append(row)
if csv_data: # Anything found?
# Print the results in ascending date order.
print(" ".join(csv_data[0]))
# Converting the timestamp to int may not be necessary (but doesn't hurt)
for row in sorted(csv_data[1:], key=lambda r: int(r[1])):
print(" ".join(row))

How to convert series to dataframe in Pandas

I have two CSVs I need to compare them based on one column. And I need to put matched rows in one csv and unmatched rows in other.
So, I created index on that column in second csv and looped through first.
df1 = pd.read_csv(file1,nrows=100)
df2 = pd.read_csv(file2,nrows=100)
df2.set_index('crc', inplace = True)
matched_list = []
non_matched_list = []
for _, row in df1.iterrows():
try:
x = df2.loc[row['crc']]
matched_list.append(x)
except KeyError:
non_matched_list.append(row)
The x here is a series in the following format
policyID 448094
statecode FL
county CLAY COUNTY
eq_site_limit 1322376.3
hu_site_limit 1322376.3
fl_site_limit 1322376.3
fr_site_limit 1322376.3
tiv_2011 1322376.3
tiv_2012 1438163.57
eq_site_deductible 0
hu_site_deductible 0.0
fl_site_deductible 0
fr_site_deductible 0
point_latitude 30.063936
point_longitude -81.707664
line Residential
construction Masonry
point_granularity 3
Name: 448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,0,0.0, dtype: object
My output csv should be in following format
policyID,statecode,county,eq_site_limit,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity
114455,FL,CLAY COUNTY,498960,498960,498960,498960,498960,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1
For all the series in the matched and unmatched. How do I do it?
I can not get rid off index in second csv as performance in important.
Following are the content of two csv files.
File1:
policyID,statecode,county,crc,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity
114455,FL,CLAY COUNTY,589658,498960,498960,498960,498960,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1
448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,1438163.57,0,0,0,0,30.063936,-81.707664,Residential,Masonry,3
206893,FL,CLAY COUNTY,745689.4,190724.4,190724.4,190724.4,190724.4,192476.78,0,0,0,0,30.089579,-81.700455,Residential,Wood,1
333743,FL,CLAY COUNTY,0,12563.76,0,0,79520.76,86854.48,0,0,0,0,30.063236,-81.707703,Residential,Wood,3
172534,FL,CLAY COUNTY,0,254281.5,0,254281.5,254281.5,246144.49,0,0,0,0,30.060614,-81.702675,Residential,Wood,1
785275,FL,CLAY COUNTY,0,515035.62,0,0,515035.62,884419.17,0,0,0,0,30.063236,-81.707703,Residential,Masonry,3
995932,FL,CLAY COUNTY,0,19260000,0,0,19260000,20610000,0,0,0,0,30.102226,-81.713882,Commercial,Reinforced Concrete,1
223488,FL,CLAY COUNTY,328500,328500,328500,328500,328500,348374.25,0,16425,0,0,30.102217,-81.707146,Residential,Wood,1
433512,FL,CLAY COUNTY,315000,315000,315000,315000,315000,265821.57,0,15750,0,0,30.118774,-81.704613,Residential,Wood,1
142071,FL,CLAY COUNTY,705600,705600,705600,705600,705600,1010842.56,14112,35280,0,0,30.100628,-81.703751,Residential,Masonry,1
File2:
policyID,statecode,county,crc,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity
119736,FL,CLAY COUNTY,498960,498960,498960,498960,498960,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1
448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,1438163.57,0,0,0,0,30.063936,-81.707664,Residential,Masonry,3
206893,FL,CLAY COUNTY,190724.4,190724.4,190724.4,190724.4,190724.4,192476.78,0,0,0,0,30.089579,-81.700455,Residential,Wood,1
333743,FL,CLAY COUNTY,0,79520.76,0,0,79520.76,86854.48,0,0,0,0,30.063236,-81.707703,Residential,Wood,3
172534,FL,CLAY COUNTY,0,254281.5,0,254281.5,254281.5,246144.49,0,0,0,0,30.060614,-81.702675,Residential,Wood,1
785275,FL,CLAY COUNTY,0,51564.9,0,0,515035.62,884419.17,0,0,0,0,30.063236,-81.707703,Residential,Masonry,3
995932,FL,CLAY COUNTY,0,457962,0,0,19260000,20610000,0,0,0,0,30.102226,-81.713882,Commercial,Reinforced Concrete,1
223488,FL,CLAY COUNTY,328500,328500,328500,328500,328500,348374.25,0,16425,0,0,30.102217,-81.707146,Residential,Wood,1
433512,FL,CLAY COUNTY,315000,315000,315000,315000,315000,265821.57,0,15750,0,0,30.118774,-81.704613,Residential,Wood,1
142071,FL,CLAY COUNTY,705600,705600,705600,705600,705600,1010842.56,14112,35280,0,0,30.100628,-81.703751,Residential,Masonry,1
253816,FL,CLAY COUNTY,831498.3,831498.3,831498.3,831498.3,831498.3,1117791.48,0,0,0,0,30.10216,-81.719444,Residential,Masonry,1
894922,FL,CLAY COUNTY,0,24059.09,0,0,24059.09,33952.19,0,0,0,0,30.095957,-81.695099,Residential,Wood,1
Edit:
Added sample csv

I think you can do it this way:
df1.loc[df1.crc.isin(df2.index)].to_csv('/path/to/matched.csv', index=False)
df1.loc[~df1.crc.isin(df2.index)].to_csv('/path/to/unmatched.csv', index=False)
instead of looping...
Demo:
In [62]: df1.loc[df1.crc.isin(df2.index)].to_csv(r'c:/temp/matched.csv', index=False)
In [63]: df1.loc[~df1.crc.isin(df2.index)].to_csv(r'c:/temp/unmatched.csv', index=False)
Results:
matched.csv:
policyID,statecode,county,crc,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity
448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,1438163.57,0,0.0,0,0,30.063935999999998,-81.70766400000001,Residential,Masonry,3
333743,FL,CLAY COUNTY,0.0,12563.76,0.0,0.0,79520.76,86854.48,0,0.0,0,0,30.063236,-81.70770300000001,Residential,Wood,3
172534,FL,CLAY COUNTY,0.0,254281.5,0.0,254281.5,254281.5,246144.49,0,0.0,0,0,30.060614,-81.702675,Residential,Wood,1
785275,FL,CLAY COUNTY,0.0,515035.62,0.0,0.0,515035.62,884419.17,0,0.0,0,0,30.063236,-81.70770300000001,Residential,Masonry,3
995932,FL,CLAY COUNTY,0.0,19260000.0,0.0,0.0,19260000.0,20610000.0,0,0.0,0,0,30.102226,-81.713882,Commercial,Reinforced Concrete,1
223488,FL,CLAY COUNTY,328500.0,328500.0,328500.0,328500.0,328500.0,348374.25,0,16425.0,0,0,30.102217,-81.707146,Residential,Wood,1
433512,FL,CLAY COUNTY,315000.0,315000.0,315000.0,315000.0,315000.0,265821.57,0,15750.0,0,0,30.118774,-81.704613,Residential,Wood,1
142071,FL,CLAY COUNTY,705600.0,705600.0,705600.0,705600.0,705600.0,1010842.56,14112,35280.0,0,0,30.100628000000004,-81.703751,Residential,Masonry,1
unmatched.csv:
policyID,statecode,county,crc,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity
114455,FL,CLAY COUNTY,589658.0,498960.0,498960.0,498960.0,498960.0,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1
206893,FL,CLAY COUNTY,745689.4,190724.4,190724.4,190724.4,190724.4,192476.78,0,0.0,0,0,30.089578999999997,-81.700455,Residential,Wood,1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Pandas VLOOKUP function with categorical and non-numeric values - python

Assuming the dataframes are named df1 and df2, you can reshape and map to perform the VLOOKUP, then groupby+sum: (df2.set_index('Brand') .stack() .map(df1.set_index('ProductID')['Sales']) .groupby(level='Brand').sum() ) Output: Brand Brand1 102 Brand2 68

Related

How to group data by count of columns in Pandas?

My headers are in the first column of my txt file. I want to create a Pandas DF

CSV File Transpose Column to Row in Python

CSV filtering and ascending order

How to convert series to dataframe in Pandas

Categories

Resources