Python 2 replacement specific column from CSV - python

I have some CSV files the format is ID, timestamp, customerID, email, etc. I want fill the Email column to empty and other columns keep it same. I'm using Python 2.7 and is restricted to use Pandas. Can anyone help me?
Thank you all for the help
My code below, but this is not that efficiency and reliable also if some raw have the strange character it will be broken the logic.
new_columns = [
'\xef\xbb\xbfID', 'timestamp', 'CustomerID', 'Email', 'CountryCode', 'LifeCycle', 'Package', 'Paystatus', 'NoUsageEver', 'NoUsage', 'VeryLowUsage',
'LowUsage', 'NormalUsage', 'HighUsage', 'VeryHighUsage', 'LastStartDate', 'NPS 0-8', 'NPS Score (Q2)', 'Gender(Q38)', 'DOB(Q39)',
'Viaplay users(Q3)', 'Primary Content (Q42)', 'Primary platform(Q4)', 'Detractor (strong) (Q5)', 'Detractor open text(Q22)',
'Contact Detractor (Q21)', 'Contact Detractor (Q20)', 'Contact Detractor (Q43)', 'Contact Detractor(Q26)', 'Contact Detractor(Q27)',
'Contact Detractor(Q44)', 'Improvement areas(Q7)', 'Improvement areas (Q40)', 'D2 More value for money(Q45)', 'D2 Sport content(Q8)',
'D2 Series content(Q9)', 'D2 Film content(Q10)', 'D2 Children content(Q11)', 'D2 Easy to start and use(Q12)',
'D2 Technical and quality(Q13)',
'D2 Platforms(Q14)', 'D2 Service and support(Q15)', 'D3 Sport content(Q16)', 'Missing Sport Content (Q41)',
'D3 Series and films content(Q17)',
'NPS 9-10', 'Recommendation drivers(Q28)', 'R2 Sport content(Q29)', 'R2 Series content(Q30)', 'R2 Film content(Q31)',
'R2 Children content(Q32)', 'R2 Easy to start and use(Q33)', 'R2 Technical and quality(Q34)', 'R2 Platforms(Q35)',
'R2 Service and support(Q36)',
'Promoter open text(Q37)'
]
with open(file_path, 'r') as infile:
print file_path
reader = csv.reader(infile, delimiter=";")
first_row = next(reader)
for row in reader:
output_row = []
for column_name in new_columns:
ind = first_row.index(column_name)
data = row[ind]
if ind == first_row.index('Email'):
data = ''
output_row.append(data)
writer.writerow(output_row)
File format before
File format after

So you are reordering the columns and clearing the email column:
with open(file_path, 'r') as infile:
print file_path
reader = csv.reader(infile, delimiter=";")
first_row = next(reader)
for row in reader:
output_row = []
for column_name in new_columns:
ind = first_row.index(column_name)
data = row[ind]
if ind == first_row.index('Email'):
data = ''
output_row.append(data)
writer.writerow(output_row)
I would suggest moving the searches first_row.index(column_name) and first_row.index('Email') out of the per row processing.
with open(file_path, 'r') as infile:
print file_path
reader = csv.reader(infile, delimiter=";")
first_row = next(reader)
email = first_row.index('Email')
indexes = []
for column_name in new_columns:
ind = first_row.index(column_name)
indexes.append(ind)
for row in reader:
output_row = []
for ind in indexes:
data = row[ind]
if ind == email:
data = ''
output_row.append(data)
writer.writerow(output_row)
email is the index of the email column in the input. indexes is a list of the indexes of the columns in the input in the order specified by the new_columns.
Untested.

You could use dict versions of the csv reader/writer to get the column by name. Something like this:
import csv
with open('./test.csv', 'r') as infile:
reader = csv.DictReader(infile, delimiter=";")
with open('./output.csv', 'w') as outfile:
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
row['Email'] = ''
writer.writerow(row)

Related

How to parse csv file in python?

I need the first column of the table to be written to a variable, and the remaining columns (their number may vary) to be written to the list in order to get the desired value from the list. I'm trying to get email addresses, but the table itself is porridge, so every column needs to be checked.
with open('data.csv', 'r', encoding='utf-8-sig', newline='') as file:
reader = csv.reader(file)
name = list(next(reader))
for items in list(reader):
for item in items:
if '#' in item:
if not item in emails:
emails.append(item)
with open('result.csv', 'a', encoding='utf-8-sig', newline='') as file:
writer = csv.writer(file, delimiter=';')
for email in emails:
writer.writerow(
(
name,
email
)
)
Input:
Наименование,Описание,Адрес,Комментарий к адресу,Почтовый индекс,Микрорайон,Район,Город,Округ,Регион,Страна,Часы работы,Часовой пояс,Телефон 1,E-mail 1,Веб-сайт 1,Instagram 1,Twitter 1,Facebook 1,ВКонтакте 1,YouTube 1,Skype 1,Широта,Долгота,2GIS URL
Магазин автозапчастей,,"Мира, 007",,655153,,,Черногорск,Черногорск городской округ,Республика Хакасия,Россия,Пн: 09:00-18:00; Вт: 09:00-18:00; Ср: 09:00-18:00; Чт: 09:00-18:00; Пт: 09:00-18:00; Сб: 09:00-18:00,+07:00,89130502009,grandauto007#mail.ru,http://avtomagazin.2gis.biz,,,,,,,53.805192,91.334047,https://2gis.com/firm/9711414977516651
Спектр-Авто,автотехцентр,"Вяткина, 4",1 этаж,655017,,,Абакан,Абакан городской округ,Республика Хакасия,Россия,Пн: 09:00-18:00; Вт: 09:00-18:00; Ср: 09:00-18:00; Чт: 09:00-18:00; Пт: 09:00-18:00; Сб: 09:00-18:00,+07:00,89233931771,+79233940022#yandex.ru,http://spectr-avto.2gis.biz,,,,,,,53.716581,91.45005,https://2gis.com/firm/70000001034136187
The result is:
['Наименование', 'Описание', 'Адрес', 'Комментарий к адресу', 'Почтовый индекс', 'Микрорайон', 'Район', 'Город', 'Округ', 'Регион', 'Страна', 'Часы работы', 'Часовой пояс', 'Телефон 1', 'E-mail 1', 'Веб-сайт 1', 'Instagram 1', 'Twitter 1', 'Facebook 1', 'ВКонтакте 1', 'YouTube 1', 'Skype 1', 'Широта', 'Долгота', '2GIS URL'];grandauto007#mail.ru
['Наименование', 'Описание', 'Адрес', 'Комментарий к адресу', 'Почтовый индекс', 'Микрорайон', 'Район', 'Город', 'Округ', 'Регион', 'Страна', 'Часы работы', 'Часовой пояс', 'Телефон 1', 'E-mail 1', 'Веб-сайт 1', 'Instagram 1', 'Twitter 1', 'Facebook 1', 'ВКонтакте 1', 'YouTube 1', 'Skype 1', 'Широта', 'Долгота', '2GIS URL'];+79233940022#yandex.ru
['Наименование', 'Описание', 'Адрес', 'Комментарий к адресу', 'Почтовый индекс', 'Микрорайон', 'Район', 'Город', 'Округ', 'Регион', 'Страна', 'Часы работы', 'Часовой пояс', 'Телефон 1', 'E-mail 1', 'Веб-сайт 1', 'Instagram 1', 'Twitter 1', 'Facebook 1', 'ВКонтакте 1', 'YouTube 1', 'Skype 1', 'Широта', 'Долгота', '2GIS URL'];zhvirblis_yuliya#mail.ru
If I understand the question correctly, what you really want to output is a two-column CSV, with names in the first column, which I assume come from the original CSV's first column, and e-mail in the second column.
If my assumptions are correct, this should work for you:
import csv
with open('data.csv', 'r', encoding='utf-8-sig', newline='') as file:
reader = csv.reader(file)
header = list(next(reader))
emails = []
for items in reader:
name = items[0]
for item in items:
if '#' in item:
if not (name, item) in emails:
emails.append((name, item))
with open('result.csv', 'a', encoding='utf-8-sig', newline='') as file:
writer = csv.writer(file, delimiter=';')
for email in emails:
writer.writerow(email)
Output:
Магазин автозапчастей;grandauto007#mail.ru
Спектр-Авто;+79233940022#yandex.ru
Things I have changed in your code:
The input CSV header is now read into header - did you want to do anything with that?
The name is now set from items[0] for each row in the input CSV.
The emails list is now a list of (name, email) pairs.
Optimization detail: you don't need to turn reader into a list to iterate over it. Just say for items in reader:, it'll be more efficient since it will process each row as it reads it instead of storing them all into a list.
import petl
table = petl.fromcsv('data.csv', encoding='utf-8-sig')
table2 = petl.addfield(table, 'email_address', lambda r: [r[r1] for r1 in petl.header(table) if '#' in r[r1]])
table3 = petl.cut(table2, 'Наименование', 'email_address')
petl.tocsv(table3, 'result.csv', encoding='utf-8-sig', delimiter=';', write_header=True)
Load the CSV into a table
Create a new field(column) that is an aggregate of any field containing an email address
Reduce(cut) the table to only contain the 2 important fields: 'Наименование', 'email_address'
Output the results to a CSV
Output:
Наименование;email_address
Магазин автозапчастей;['grandauto007#mail.ru']
Спектр-Авто;['+79233940022#yandex.ru']
Be sure to install petl:
pip install petl

How to skip empty cells using csv.DictWriter

I am trying to anonymize data in CSV, however, I only want to do this for cells that are not empty. At present, my program adds anonymized data to all cells with the given row.
How can I skip empty the empty cells? Below is my program
import csv
from faker import Faker
from collections import defaultdict
def anonymize():
"Anonymizes the given original data to anonymized form"
faker = Faker()
names = defaultdict(faker.name)
emails = defaultdict(faker.email)
with open(filename, "r") as f:
with open(f"{filename}-anonymized_data.csv", "w") as o:
reader = csv.DictReader(f)
writer = csv.DictWriter(o, reader.fieldnames)
writer.writeheader()
for row in reader:
row["adult_First_Name"] = names[
row["adult_First_Name"]
]
row["child_First_Name"] = names[
row["child_First_Name"]
]
row["Adult - EMAIL ADDRESS"] = emails[row["Adult - EMAIL ADDRESS"]]
row["Parent - EMAIL ADDRESS"] = emails[row["Parent - EMAIL ADDRESS"]]
writer.writerow(row)
if __name__ == "__main__":
anonymize()
You could test each field before applying the fake value. A simpler approach would be to store the fields that need to be changed in a fields list along with which faker function to apply if needed:
import csv
from faker import Faker
def anonymize():
"Anonymizes the given original data to anonymized form"
faker = Faker()
fields = [
("adult_First_Name", faker.name),
("child_First_Name", faker.name),
("Adult - EMAIL ADDRESS", faker.email),
("Parent - EMAIL ADDRESS", faker.email),
]
with open(filename, "r") as f:
with open(f"{filename}-anonymized_data.csv", "w", newline="") as o:
reader = csv.DictReader(f)
writer = csv.DictWriter(o, reader.fieldnames)
writer.writeheader()
for row in reader:
for field, fake in fields:
if row[field]:
row[field] = fake()
writer.writerow(row)
if __name__ == "__main__":
anonymize()
Adding newline='' would stop extra blank lines in the output.

How do I create a loop such that I get all the queries into one csv in through python?

I have created a function that fetches price, rating, etc after it hits an API:
def is_priced(business_id):
try:
priced_ind = get_business(API_KEY, business_id)
priced_ind1 = priced_ind['price']
except:
priced_ind1 = 'None'
return priced_ind1
priced_ind = is_priced(b_id)
print(priced_ind)
Similar for rating
def is_rated(business_id):
try:
rated_ind = get_business(API_KEY, business_id)
rated_ind1 = rated_ind['rating']
except:
rated_ind1 = 'None'
return rated_ind1
However, I want my function to loop through the business names I have in my CSV file and catch all this data and export it to a new csv file with these two parameters beside the names of the business.
The CSV file has info on the name of the business along with its address,city,state,zip and country
Eg:
Name address city state zip country
XYZ(The) 5* WE 223899th St. New York NY 19921 US
My output:
Querying https://api.xyz.com/v3/businesses/matches ...
True
Querying https://api.xyz.com/v3/businesses/matches ...
4.0
Querying https://api.xyz.com/v3/businesses/matches ...
$$
Querying https://api.xyz.com/v3/businesses/matches ...
Querying https://api.xyz.com/v3/businesses/matches ...
The real issue is my output only returns business id in the csv. and the rating etc as u see is just returned in the console. how do I set a loop such that it returns for all the businesses the info i desire into a single CSV?
The csv module is useful for this sort of thing e.g.
import csv
with open('f.csv', 'r') as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
with open('tmp.csv', 'w') as output:
writer = csv.writer(output)
for row in reader:
business_id = row[0]
row.append(get_price_index(business_id))
row.append(get_rate_index(business_id))
writer.writerow(row)
You can read the business names from the CSV file, iterate over them using a for loop, hit the API and store the results, and write to a new CSV file.
import csv
data = []
with open('businesses.csv') as fp:
# skip header line
header = next(fp)
reader = csv.reader(fp)
for row in reader:
b_name = reader[0]
# not sure how you get the business ID:
b_id = get_business_id(b_name)
p = is_priced(b_id)
r = is_rated(b_id)
out.append((b_name, p, r))
# write out the results
with open('business_data.csv', 'w') as fp:
writer = csv.writer(fp)
writer.writerow(['name', 'price', 'rating'])
for row in data:
writer.writerow(row)
You can do this easily using pandas:
import pandas as pd
csv = pd.read_csv('your_csv.csv', usecols=['business_name']) # since you only need the name
# you'll receive business_name in your functions
csv = csv.apply(is_priced, axis=1)
csv = csv.apply(is_rated, axis=1)
csv.to_csv('result.csv', index=False)
All you have to do in your functions is:
def is_priced(row):
business_name = row['business_name']
business_id = ??
...

Error reading csv numbers like "-1 000,00"

I get errror when i try to read numbers like "-1 000,00".
Result is '-1\xa0000,00'.
How to fix my code to clear the error?
def read_csv(filename):
list = []
with open(filename, 'r', encoding='utf-8') as local_file:
fields = ['Account_group_name',
'Current_balance',
'Account_name',
'Transfer_account_name',
'Description',
'Partner_name',
'Category',
'Date',
'Time',
'Memo',
'Sum',
'Currency',
'Face_balance',
'Balance',
]
reader = csv.DictReader(local_file, fields, delimiter=';')
next(reader)
for row in reader:
list.append(row)
return list

Python csv file writing in different columns

Im trying to read a csv file, and create a new cvs file, with the contents of the old cvs file with Python. My Problem is, that all entrys are saved in the first column, and i cant find a way to save the informations in different columns. Here is my code:
import csv
from itertools import zip_longest
fieldnamesOrdered = ['First Name', 'Last Name' , 'Email', 'Phone Number',
'Street Address', 'City', 'State', 'HubSpot Owner', 'Lifecyle Stage', 'Lead
Status', 'Favorite Color']
listOne = []
listTwo = []
with open('Contac.csv', 'r', encoding = 'utf-8') as inputFile,
open('result.csv', 'w', encoding = 'utf-8') as outputFile:
reader = csv.DictReader(inputFile)
writer = csv.writer(outputFile, delimiter = 't')
for row in reader:
listOne.append(row['First Name'])
listTwo.append(row['Last Name'])
dataLists = [listOne, listTwo]
export_data = zip_longest(*dataLists, fillvalue='')
writer.writerow(fieldnamesOrdered)
writer.writerows(export_data)
inputFile.close()
outputFile.close()
Thank you very much for your answers
writer = csv.writer(outputFile, delimiter = 't')
Aren't those entries in the first column additionally interspersed with strange unsolicited 't' characters?

Categories

Resources