Python - How to compare CSV with duplicate Names and Increment values

Python - How to compare CSV with duplicate Names and Increment values - python

I'm trying to take the First Column (Name) and the fourth column (is active) from a CSV file and do the following:
Create a single entry for the Company Name
If 'is active' = yes then increment the value and output the final result.
If 'is active' = NO, then increment that number and give me a 'is active', 'is not active' list with a value at the end.
Data1 and Data2 fields are other columns that I don't care about at this time.
csv =
Name,Data1,Data2, Is Active:
Company 1,Data1,Data2,Yes
Company 1,Data1,Data2,Yes
Company 1,Data1,Data2,Yes
Company 2,Data1,Data2,Yes
Company 2,Data1,Data2,No
Company 2,Data1,Data2,Yes
Company 2,Data1,Data2,Yes
Company 3,Data1,Data2,No
Company 3,Data1,Data2,No
Ideal result would be in the format of:
Company name, Yes-count, no-count
I've started with csvreader to read the columns and I can put them into lists, but i'm unsure how to compare and consolidate names and counts after that.
Any help would be greatly appreciated.

One way to do, Use this:
with open("your_csv_file", "r") as file:
reader = csv.reader(file)
_ = next(reader) # skip header
consolidated = {}
for line in reader:
company_name = line[0]
is_active = line[3]
if company_name not in consolidated:
consolidated[company_name] = { "yes_count": 0, "no_count": 0}
if is_active == "Yes":
consolidated[company_name]["yes_count"] += 1
else:
consolidated[company_name]["no_count"] += 1
Sample Output:
>>> print(consolidated)
{
'Company 1': {'yes_count': 3, 'no_count': 0},
'Company 2': {'yes_count': 3, 'no_count': 1},
'Company 3': {'yes_count': 0, 'no_count': 2}
}

Related

Increase Speed of Nested For Loops While Changing Value of a DataFrame

I'm looking to increase the speed of the nested for loops.
VARIABLES:
'dataframe' - The dataframe I am attempting to modify in the second for loop. It consists of a multitude of training sessions for the same people. This is the attendance document that is changed if a match exists in the reporting dataframe.
'dictNewNames' - This is a dictionary of session title names. The key is the longer session title name and the value is a stripped session title name. For example {'Week 1: Training': 'Training'} etc. The key is equal to the 'Session Title' column in each row but the value is used for searching a substring in the second for loop.
'reporting' - A dataframe that includes information regarding session titles and attendance participation. The reporting dataframe is already filtered so everyone in the 'reporting' dataframe should get credit in 'dataframe'. The only caveat is that the 'search' name is nested within the pathway title.
dataframe = {
'Session Title': ['Organization Week 1: Train', 'Organization Week 2: Train', 'Organization Week 3: Train'],
'Attendee Email': ['name#gmail.com', 'name2#gmail.com', 'name3#gmail.com'],
'Completed': ['No', 'No', 'No'],
'Date Completed': ['','','']}
dictNewNames = { 'Organization Week 1: Train': 'Train', ' Organization Week 2: Train': 'Train', 'Organization Week 3: Train': 'Train' }
Title formatting is not incorrect (i.e. ':' vs '-' as seen in pathway title below). The data is completely all over the place in terms of format.
reporting = {
'Pathway Title': ['Training 1 - Train', 'Training 2: Train', 'Training 3 - Train'],
'Email': ['name#gmail.com', 'name2#gmail.com', 'name3#gmail.com'],
'Date Completed': ['xx/yy/xx', 'yy/xx/zz', 'zz/xx/yy']}
expectedOuput = {
'Session Title': ['Organization Week 1: Train', 'Organization Week 2: Train', 'Organization Week 3: Train'],
'Attendee Email': ['name#gmail.com', 'name2#gmail.com', 'name3#gmail.com'],
'Completed': ['Yes', 'Yes', 'Yes'],
'Date Completed': ['xx/yy/xx', 'yy/xx/zz', 'zz/xx/yy']}
My code:
def giveCredit(dataframe, dictNewNames, reporting):
for index, row in dataframe.iterrows():
temp = row['Session Title']
searchName = dictNewNames[temp]
attendeeEmail = row['Attendee: Email']
for index1, row1 in reporting.iterrows():
pathwayTitle = row1['Pathway Title']
Email = row1['Organization Email']
dateCompleted = row1['Date Completed']
if attendeeEmail == Email and searchName in pathwayTitle:
dataframe.at[index, 'Completed'] = 'Yes'
dataframe.at[index, 'Date Completed'] = dateCompleted
break
return dataframe

Your pattern looks like merge:
for loop1 on first dataframe:
for loop2 on second dataframe:
if conditions match between both dataframes:
So:
# Create a common key Name based on dictNewNames
pat = fr"({'|'.join(dictNewNames.values())})"
name1 = dataframe['Session Title'].map(dictNewNames)
name2 = reporting['Pathway Title'].str.extract(pat)
# Merge dataframes based on this key and email
out = pd.merge(dataframe.assign(Name=name1),
reporting.assign(Name=name2),
left_on=['Name', 'Attendee Email'],
right_on=['Name', 'Email'],
how='left', suffixes=(None, '_'))
# Update the dataframe
out['Date Completed'] = out.pop('Date Completed_')
out['Completed'] = np.where(out['Date Completed'].notna(), 'Yes', 'No')
out = out[dataframe.columns]
Output:
>>> out
Session Title Attendee Email Completed Date Completed
0 Week 1: Train 1 name#gmail.com Yes xx/yy/xx
1 Week 2: Train 2 name2#gmail.com Yes yy/xx/zz
2 Week 3: Train 3 name3#gmail.com Yes zz/xx/yy

This workaround cut my execution time from 460 seconds to under 10.
def giveCredit(dataframe, dictNewNames, reporting):
reporting['Date Completed'] = pd.to_datetime(reporting['Date Completed'])
for index1, row in dataframe.iterrows():
temp = row['Session Title']
numberList = re.findall('[0-9]+', temp)
finalNumber = str(numberList[0])
searchName = dictNewNames[temp]
attendeeEmail = row['Attendee: Email']
row = reporting.loc[(reporting['Pathway Title'].str.contains(searchName, case=False)) & (reporting['Organization Email'] == attendeeEmail)]
if len(row.index) != 0:
new_row = row.loc[(reporting['Pathway Title'].str.contains(finalNumber, case=False))]
if len(new_row.index) != 0:
dataframe = modifyFrame(dataframe, new_row, index1)
else:
dataframe= modifyFrame(dataframe, row, index1)
dataframe = dataframe.sort_values(["Completed", "Attendee"], ascending=[False, True])
return dataframe
def modifyFrame(frame, row, index1):
dateCompleted = row['Date Completed']
dateCompleted = dateCompleted.to_string(buf=None, header=False, index=False, length=False, name=False, max_rows=None).strip()
dataframe.at[index1, 'Completed'] = 'Yes'
dataframe.at[index1, 'Date Completed'] = dateCompleted
return dataframe

Turning a CSV file into a dictionary

Table
I am trying to make a dictionary using the values in the table above. I am trying to use 'Genre' as the Key and then a list of tuples for the name, publisher, and platform. variable explorer
D1[genre] = { genre: [(name, publisher, platform),..],..}
My Code:
import csv
fp = open('video_game_sales_tiny.csv','r')
fp.readline()
data_reader = csv.reader(fp)
D1 = {}
for line in data_reader:
name = line[0].lower()
platform = line[1]
year = line[2]
genre = line[3].lower()
publisher = line[4].lower()
D1[genre] = [name, publisher, platform, year]
There are multiple Genres with the same name, and when the loop gets to a genre that matches the Key, it copies over dictionary instead of adding a tuple to the dictionary.
I am trying to make the dictionary look like:
D1 = { Puzzle: [ (Pac-man, Atari, 2600, 1982),(BurgerTime, Mattel Interactive, 2600, 1981), (Q*bert, Parker Bros, 2600, 1982), Shooter: [ (),(), ()], Action: [ (),(), ()] }

You need to make the values of your dictionary an array of tuples. Then you can append new tuples instead of overwriting them. Here is an example using your code:
for line in data_reader:
name = line[0].lower()
platform = line[1]
year = line[2]
genre = line[3].lower()
publisher = line[4].lower()
if genre in D1:
D1[genre].append((name, publisher, platform, year))
else:
D1[genre] = [(name, publisher, platform, year)]

for line in data_reader:
name = line[0].lower()
platform = line[1]
year = line[2]
genre = line[3].lower()
publisher = line[4].lower()
if genre in D1:
D1[genre].append((name, publisher, platform, year))
else:
D1[genre] = [(name, publisher, platform, year)]

Cannot get the value if the sharepoint column type is "Person" - Python

I am trying to extract a list from Sharepoint. The thing is that if the column type is "Person or Group" Python show me a KeyError but if the column type is different I can get it.
This is my code to to get the values:
print("Item title: {0}, Id: {1}".format(item.properties["Title"], item.properties['AnalystName']))
And Title works but AnalystName does not. both are the internal names in the sharepoint.

authcookie = Office365('https://xxxxxxxxx.sharepoint.com', username='xxxxxxxxx', password='xxxxxxxxx').GetCookies()
site = Site('https://xxxxxxxxxxxx.sharepoint.com/sites/qualityassuranceteam', authcookie=authcookie)
new_list = site.List('Process Review - Customer Service Opt In/Opt Out')
query = {'Where': [('Gt', 'Audit Date', '2020-02-16')]}
sp_data = new_list.GetListItems(fields=['App ID', 'Analyst Name', 'Team Member Name', "Team Member's Supervisor Name",
'Audit Date', 'Event Date (E.g. Call date)', 'Product Type', 'Master Contact Id',
'Location', 'Team member read the disclosure?', 'Team member withheld the disclosure?',
'Did the team member take the correct action?', 'Did the team member notate the account?',
'Did the team member add the correct phone number?', 'Comment (Required)',
'Modified'], query=query)
#print(sp_data[0])
final_file = '' #Create an empty File
num = 0
for k in sp_data:
values = sp_data[num].values()
val = "|".join(str(v).replace('None', 'null') for v in values) + '\n'
num += 1
final_file += val
file_name = 'test.txt'
with open(file_name, 'a', encoding='utf-8') as file:
file.write(final_file)
So right now I´m getting what I want but there is a problem. When a Column is empty it skips the column instead of bring an empty space. for example:
col-1 | col-2 | col-3 |
HI | 10 | 8 |
Hello | | 7 |
So in this table the row 1 is full so it will bring me evertything as:
HI|10|8
but the second row brings me
Hello|7
and I need Hello||7

Person Fields are getting parsed with different names from items
Ex: UserName gets changed to UserNameId and UserNameString
That is the reason for 'KeyError' since the items list is not having the item
Use Below code to get the person field values
#Python Code
from office365.runtime.auth.user_credential import UserCredential
from office365.sharepoint.client_context import ClientContext
site_url = "enter sharepoint url"
sp_list = "eneter list name"
ctx = ClientContext(site_url).with_credentials(UserCredential("username","password"))
tasks_list = ctx.web.lists.get_by_title(sp_list)
items = tasks_list.items.get().select(["*", "UserName/Id", "UserName/Title"]).expand(["UserName"]).execute_query()
for item in items: # type:ListItem
print("{0}".format(item.properties.get('UserName').get("Title")))

I need help on how to save dictionary elements into a csv file

I intend to save a contact list with name and phone number in a .csv file from user input through a dictionary.
The problem is that only the name is saved on the .csv file and the number is omitted.
contacts={}
def phone_book():
running=True
while running:
command=input('A(dd D)elete L)ook up Q)uit: ')
if command=='A' or command=='a':
name=input('Enter new name: ')
print('Enter new number for', name, end=':' )
number=input()
contacts[name]=number
elif command=='D' or command=='d':
name= input('Enter the name to delete: ')
del contacts[name]
elif command=='L' or command=='l':
name= input('Enter name to search: ')
if name in contacts:
print(name, contacts[name])
else:
print("The name is not in the phone book, use A or a to save")
elif command=='Q' or command=='q':
running= False
elif command =='list':
for k,v in contacts.items():
print(k,v)
else:
print(command, 'is not a valid command')
def contact_saver():
import csv
global name
csv_columns=['Name', 'Phone number']
r=[contacts]
with open(r'C:\Users\Rigelsolutions\Documents\numbersaver.csv', 'w') as f:
dict_writer=csv.writer(f)
dict_writer.writerow(csv_columns)
for data in r:
dict_writer.writerow(data)
phone_book()
contact_saver()

as I am reading your code contacts will look like
{
'name1': '1',
'name2': '2'
}
keys are the names and the value is the number.
but when you did r = [contacts] and iterating over r for data in r that will mess up I guess your code since you are passing dictionary value to writerow instead of a list [name, number]
You can do two things here. parse properly the contacts by:
for k, v in contacts.items():
dict_writer.writerow([k, v])
Or properly construct the contacts into a list with dictionaries inside
[{
'name': 'name1',
'number': 1
}]
so you can create DictWriter
fieldnames = ['name', 'number']
writer = csv.DictWriter(f, fieldnames=fieldnames)
...
# then you can insert by
for contact in contacts:
writer.writerow(contact) # which should look like writer.writerow({'name': 'name1', 'number': 1})

How can I get user_id in separated place?

I wanna parse excel& make dictionary and connect the model(User) which has same user_id of dictionary.
Excel is
user_id is in F1,so I really cannot understand how to make dictionary.
Now views.py is
#coding:utf-8
from django.shortcuts import render
import xlrd
from .models import User
book = xlrd.open_workbook('../data/excel1.xlsx')
sheet = book.sheet_by_index(1)
def build_employee(employee):
if employee == 'leader':
return 'l'
if employee == 'manager':
return 'm'
if employee == 'others':
return 'o'
for row_index in range(sheet.nrows):
rows = sheet.row_values(row_index)
is_man = rows[4] != ""
emp = build_employee(rows[5])
user = User(user_id=rows[1], name_id=rows[2], name=rows[3],
age=rows[4],man=is_man,employee=emp)
user.save()
book2 = xlrd.open_workbook('../data/excel2.xlsx')
sheet2 = book2.sheet_by_index(0)
headers = sheet2.row_values(0)
large_item = None
data_dict = {}
for row_index in range(sheet2.nrows):
rows2 = sheet2.row_values(row_index)
large_item = rows2[1] or large_item
# Create dict with headers and row values
row_data = {}
for idx_col, value in enumerate(rows2):
header_value = headers[idx_col]
# Avoid to add empty column. A column in your example
if header_value:
row_data[headers[idx_col]] = value
# Add row_data to your data_dict with
data_dict[row_index] = row_data
for row_number, row_data in data_dict.items():
user1 = User.objects.filter(user_id = data['user_id']).exists()
if user1:
user1.__dict__.update(**data_dict)
user1.save()
My codes only can catch data in same place(in this case B4~E4),so I cannot understand how to write to achieve my goal.How should I write it?
Ideal dictionary is
{"user_id":1, "name":"Blear","nationality":"America","domitory":"A","group":1}

Your spreadsheet appears to only have one entry? If this is the case, you do not need to iterate over the rows, but instead just extract the locations you need, for example:
import xlrd
book = xlrd.open_workbook('excel1.xlsx')
sheet = book.sheet_by_index(0)
cells = [
('user_id', 0, 5),
('name', 3, 1),
('nationality', 3, 2),
('domitory', 3, 3),
('group', 3, 4)]
user1 = {key:sheet.cell_value(rowy, colx) for key, rowy, colx in cells}
print user1
Giving you:
{'nationality': u'America', 'user_id': 1.0, 'name': u'Blear', 'group': 1.0, 'domitory': u'A'}
This uses a Python dictionary comprehension to build the user1 dictionary based on cells.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - How to compare CSV with duplicate Names and Increment values - python

Related

Increase Speed of Nested For Loops While Changing Value of a DataFrame

Turning a CSV file into a dictionary

Cannot get the value if the sharepoint column type is "Person" - Python

I need help on how to save dictionary elements into a csv file

How can I get user_id in separated place?

Categories

Resources