How to correctly use AND operator in python - python

I have a python code where I am trying to read a csv file using pandas.
Once I have read the csv, the user is asked for input to enter the first name and I filter the records and show only the matching records.
Code :
# importing the module
import pandas as pd
# read specific columns of csv file using Pandas
df = pd.read_csv(r'D:\Extension\firefox_extension\claimant_record.csv', usecols = ['ClaimantFirstName','ClaimantLastName','Occupation'])
print(df)
required_df_firstName = input("Enter First Name of Claimant: ")
select_df = df[df['ClaimantFirstName'] == required_df_firstName]
print(select_df)
Issue :
There are multiple records matching with first name so I want the user to be able to narrow down the result using first and last name and I am not able to properly write the AND condition.
What I have tried so far:
select_df = df[df['ClaimantFirstName'] == required_df_firstName]and df[df['ClaimantLastName'] == required_df_lastName]

You can use boolean indexing to check more than one condition at once.
Here's how it would work:
import pandas as pd
# add sample data
data = {'ClaimantFirstName': ["Jane", "John"], "ClaimantLastName": ["Doe", "Doherty"]}
df = pd.DataFrame(data)
# filter by name
required_df_firstName, required_df_lastName = "John", "Doherty"
select_df = df.loc[(df['ClaimantFirstName'] == required_df_firstName) & (df['ClaimantLastName'] == required_df_lastName), ['ClaimantFirstName', 'ClaimantLastName']]

Check isin
#required_df_firstName = input("Enter First Name of Claimant: ")
#notice the input would be list type , like ['a','b']
select_df = df[df['ClaimantFirstName'].isin(required_df_firstName)]

Related

pandas dataframe not creating new column

I have data like this. What I am trying to do is to create a rule, based on domain names for my project. I want to create a new column named new_url based on domains. If it contains .cdn. it will take the string before .cdn. , otherwise it will call url parser library and parse url in another way. The problem is that in the csv file I created (cleanurl.csv) , there is no new_url column created. When I print parsed urls in code, I can see them. If and else condition are working. Could you help me please ?
import pandas as pd
import url_parser
from url_parser import parse_url,get_url,get_base_url
import numpy as np
df = pd.read_csv("C:\\Users\\myuser\\Desktop\\raw_data.csv", sep=';')
i=-1
for x in df['domain']:
i=i+1
print("*",x,"*")
if '.cdn.' in x:
parsed_url=x.split('.cdn')[0]
print(parsed_url)
df.iloc[i]['new_url']=parsed_url
else:
parsed_url=get_url(x).domain +'.' + get_url(x).top_domain
print(parsed_url)
df.iloc[i]['new_url']=parsed_url
df.to_csv("C:\\Users\\myuser\\Desktop\\cleanurl.csv", sep=';')
Use .loc[row, 'column'] to create new column
for idx, x in df['domain'].items():
if '.cdn.' in x:
df.loc[idx, 'new_url'] = parsed_url
else:
df.loc[idx, 'new_url'] = parsed_url

How can I use the .findall() function for a excel file iterating through all rows of a column?

I have a big excel sheet with information about different companies altogether in a single cell for each company and my goal is to separate this into different columns following patterns to scrape the info from the first column. The original data looks like this:
My goal is to achieve a dataframe like this:
I have created the following code to use the patterns Mr., Affiliation:, E-mail:, and Mobile because they are repeated in every single row the same way. However, I don't know how to use the findall() function to scrape all the info I want from each row of the desired column.
import openpyxl
import re
import sys
import pandas as pd
reload(sys)
sys.setdefaultencoding('utf8')
wb = openpyxl.load_workbook('/Users/ap/info1.xlsx')
ws = wb.get_sheet_by_name('Companies')
w={'Name': [],'Affiliation': [], 'Email':[]}
for row in ws.iter_rows('C{}:C{}'.format(ws.min_row,ws.max_row)):
for cells in row:
a=re.findall(r'Mr.(.*?)Affiliation:',aa, re.DOTALL)
a1="".join(a).replace('\n',' ')
b=re.findall(r'Affiliation:(.*?)E-mail',aa,re.DOTALL)
b1="".join(b).replace('\n',' ')
c=re.findall(r'E-mail(.*?)Mobile',aa,re.DOTALL)
c1="".join(c).replace('\n',' ')
w['Name'].append(q1)
w['Affiliation'].append(r1)
w['Email'].append(s1)
print cell.value
df=pd.DataFrame(data=w)
df.to_excel(r'/Users/ap/info2.xlsx')
I would go with this, which just replaces the 'E-mail:...' with a delimiter and then splits and assigns to the right column
df['Name'] = np.nan
df['Affiliation'] = np.nan
df['Email'] = np.nan
df['Mobile'] = np.nan
for i in range(0, len(df)):
full_value = df['Companies'].loc[i]
full_value = full_value.replace('Affiliation:', ';').replace('E-mail:', ';').replace('Mobile:', ';')
full_value = full_value.split(';')
df['Name'].loc[i] = full_value[0]
df['Affiliation'].loc[i] = full_value[1]
df['Email'].loc[i] = full_value[2]
df['Mobile'].loc[i] = full_value[3]
del df['Companies']
print(df)

Pandas & Python: Is there a way to import the contents of an excel file, add text content to the end of the file, and save down?

I have a list of names that are stored in an excel file. The user needs to be able to import that list, which is a single column, and add additional names to the list, and save down the file.
I've imported the excel file using pandas and created a dataframe (df). I've tried to append the df using a loop function.
import numpy as np
import pandas as pd
path = 'C:\\NY_Operations\\EdV\\Streaman\\Python\\CES Fee Calc\\'
file_main = 'Main.xlsx'
df_main = pd.read_excel(path + file_main)
while True:
b = (input("Enter name of new deal to be added to 'Main' spreadsheet or type 'Exit' "))
df_main.append(b)
if df_main [-1] == "Exit":
df_main.pop()
break
The spreadsheet has "Toy", "Color", "Ball" in A1, A2, and A3. The user should be prompted to add new deals and he/she adds "Watch" and "Belt" and then writes "Exit" and the loop ends. In the spreadsheet, A4 should show "Watch" and A5 should show "Belt" in the df and spreadsheet.
I was able to figure it out. Thanks for everyone's help.
import numpy as np
import pandas as pd
#create list of deals in the 'main' consolidation spreadsheet
path = 'C:\\NY_Operations\\EdV\\Streaman\\Python\\CES Fee Calc\\'
file_main = 'Main.xlsx'
df_main = pd.read_excel(path + file_main)
new_deals = [] #each entry is the name of the item purchased
while True:
g = (input("Enter name of item or exit "))
new_deals.append(g)
if new_deals [-1] == "exit":
new_deals.pop()
break
df_newdeals = pd.DataFrame({'Deal Name':new_deals})
df1 = pd.concat([df_main,df_newdeals])
print(df1)
df1.to_excel(path + file_main)

Print specific columns from an excel file imported to python

I have a table as below:
How can I print all the sources that have an 'X' for a particular column?. For example, if I want to specify "Make", the output should be:
Delivery
Reputation
Profitability
PS: The idea is to import the excel file in python and do this operation.
use pandas
import pandas as pd
filename = "yourexcelfile"
dataframe = pd.read_excel(filename)
frame = dataframe.loc[dataframe["make"] == "X"]
print(frame["source"])

How to search CSV file with multiple search criteria and print row?

I have a .csv file with about 1000 rows which looks like:
id,first_name,last_name,email,gender,ip_address,birthday
1,Ced,Begwell,cbegwell0#google.ca,Male,134.107.135.233,17/10/1978
2,Nataline,Cheatle,ncheatle1#msn.com,Female,189.106.181.194,26/06/1989
3,Laverna,Hamlen,lhamlen2#dot.gov,Female,52.165.62.174,24/04/1990
4,Gawen,Gillfillan,ggillfillan3#hp.com,Male,83.249.190.232,31/10/1984
5,Syd,Gilfether,sgilfether4#china.com.cn,Male,180.153.199.106,11/07/1995
What I have for code so far will ask for input, then go over each row and print the row if it contains the input. Looks like so:
import csv
# Asks for search criteria from user
search = input("Enter search criteria:\n")
# Opens csv data file
file = csv.reader(open("MOCK_DATA.csv"))
# Go over each row and print it if it contains user input.
for row in file:
if search in row:
print(row)
What I want for end result, and what I'm stuck on, is to be able to enter more that one search criteria seperated by a "," and it will search and print those rows. Kind of like a way to filter the list.
for expample if there was multiple "David" that are "Male" in the file. I could enter : David, Male
It would then print all the rows that match but ignore those with a "David" thats is "Female".
You can split the input on the comma then check to make sure each field from the input is present on a given line using all() and list comprehensions.
This example uses a simplistic splitting of the input, and doesn't care which field each input matches. If you want to only match to specific columns, look into using csv.DictReader instead of csv.reader.
import csv
# Asks for search criteria from user
search_parts = input("Enter search criteria:\n").split(",")
# Opens csv data file
file = csv.reader(open("MOCK_DATA.csv"))
# Go over each row and print it if it contains user input.
for row in file:
if all([x in row for x in search_parts]):
print(row)
If you are happy to use a 3rd party library, this is possible with pandas.
I have modified your data slightly to demonstrate a simple query.
import pandas as pd
from io import StringIO
mystr = StringIO("""id,first_name,last_name,email,gender,ip_address,birthday
1,Ced,Begwell,cbegwell0#google.ca,Male,134.107.135.233,17/10/1978
2,Nataline,Cheatle,ncheatle1#msn.com,Female,189.106.181.194,26/06/1989
3,Laverna,Hamlen,lhamlen2#dot.gov,Female,52.165.62.174,24/04/1990
4,David,Gillfillan,ggillfillan3#hp.com,Male,83.249.190.232,31/10/1984
5,David,Gilfether,sgilfether4#china.com.cn,Male,180.153.199.106,11/07/1995""")
# replace mystr with 'file.csv'
df = pd.read_csv(mystr)
# retrieve user inputs
first_name = input('Input a first name\n:')
gender = input('Input a gender, Male or Female\n:')
# calculate Boolean mask
mask = (df['first_name'] == first_name) & (df['gender'] == gender)
# apply mask to result
res = df[mask]
print(res)
# id first_name last_name email gender \
# 3 4 David Gillfillan ggillfillan3#hp.com Male
# 4 5 David Gilfether sgilfether4#china.com.cn Male
# ip_address birthday
# 3 83.249.190.232 31/10/1984
# 4 180.153.199.106 11/07/1995
While you could just check if the strings "David" and "Male" exist in a row, it would not be very precise should you need to check column values. Instead, read in the data via csv and create a list of namedtuple objects that store the search value and header name:
from collections import namedtuple
import csv
data = list(csv.reader(open('filename.csv')))
search = namedtuple('search', 'value,header')
searches = [search(i, data[0].index(b)) for i, b in zip(input().split(', '), ['first_name', 'gender'])]
final_results = [i for i in data if all(c.value == i[c.header] for c in searches)]

Categories

Resources