Related
I need the first column of the table to be written to a variable, and the remaining columns (their number may vary) to be written to the list in order to get the desired value from the list. I'm trying to get email addresses, but the table itself is porridge, so every column needs to be checked.
with open('data.csv', 'r', encoding='utf-8-sig', newline='') as file:
reader = csv.reader(file)
name = list(next(reader))
for items in list(reader):
for item in items:
if '#' in item:
if not item in emails:
emails.append(item)
with open('result.csv', 'a', encoding='utf-8-sig', newline='') as file:
writer = csv.writer(file, delimiter=';')
for email in emails:
writer.writerow(
(
name,
email
)
)
Input:
Наименование,Описание,Адрес,Комментарий к адресу,Почтовый индекс,Микрорайон,Район,Город,Округ,Регион,Страна,Часы работы,Часовой пояс,Телефон 1,E-mail 1,Веб-сайт 1,Instagram 1,Twitter 1,Facebook 1,ВКонтакте 1,YouTube 1,Skype 1,Широта,Долгота,2GIS URL
Магазин автозапчастей,,"Мира, 007",,655153,,,Черногорск,Черногорск городской округ,Республика Хакасия,Россия,Пн: 09:00-18:00; Вт: 09:00-18:00; Ср: 09:00-18:00; Чт: 09:00-18:00; Пт: 09:00-18:00; Сб: 09:00-18:00,+07:00,89130502009,grandauto007#mail.ru,http://avtomagazin.2gis.biz,,,,,,,53.805192,91.334047,https://2gis.com/firm/9711414977516651
Спектр-Авто,автотехцентр,"Вяткина, 4",1 этаж,655017,,,Абакан,Абакан городской округ,Республика Хакасия,Россия,Пн: 09:00-18:00; Вт: 09:00-18:00; Ср: 09:00-18:00; Чт: 09:00-18:00; Пт: 09:00-18:00; Сб: 09:00-18:00,+07:00,89233931771,+79233940022#yandex.ru,http://spectr-avto.2gis.biz,,,,,,,53.716581,91.45005,https://2gis.com/firm/70000001034136187
The result is:
['Наименование', 'Описание', 'Адрес', 'Комментарий к адресу', 'Почтовый индекс', 'Микрорайон', 'Район', 'Город', 'Округ', 'Регион', 'Страна', 'Часы работы', 'Часовой пояс', 'Телефон 1', 'E-mail 1', 'Веб-сайт 1', 'Instagram 1', 'Twitter 1', 'Facebook 1', 'ВКонтакте 1', 'YouTube 1', 'Skype 1', 'Широта', 'Долгота', '2GIS URL'];grandauto007#mail.ru
['Наименование', 'Описание', 'Адрес', 'Комментарий к адресу', 'Почтовый индекс', 'Микрорайон', 'Район', 'Город', 'Округ', 'Регион', 'Страна', 'Часы работы', 'Часовой пояс', 'Телефон 1', 'E-mail 1', 'Веб-сайт 1', 'Instagram 1', 'Twitter 1', 'Facebook 1', 'ВКонтакте 1', 'YouTube 1', 'Skype 1', 'Широта', 'Долгота', '2GIS URL'];+79233940022#yandex.ru
['Наименование', 'Описание', 'Адрес', 'Комментарий к адресу', 'Почтовый индекс', 'Микрорайон', 'Район', 'Город', 'Округ', 'Регион', 'Страна', 'Часы работы', 'Часовой пояс', 'Телефон 1', 'E-mail 1', 'Веб-сайт 1', 'Instagram 1', 'Twitter 1', 'Facebook 1', 'ВКонтакте 1', 'YouTube 1', 'Skype 1', 'Широта', 'Долгота', '2GIS URL'];zhvirblis_yuliya#mail.ru
If I understand the question correctly, what you really want to output is a two-column CSV, with names in the first column, which I assume come from the original CSV's first column, and e-mail in the second column.
If my assumptions are correct, this should work for you:
import csv
with open('data.csv', 'r', encoding='utf-8-sig', newline='') as file:
reader = csv.reader(file)
header = list(next(reader))
emails = []
for items in reader:
name = items[0]
for item in items:
if '#' in item:
if not (name, item) in emails:
emails.append((name, item))
with open('result.csv', 'a', encoding='utf-8-sig', newline='') as file:
writer = csv.writer(file, delimiter=';')
for email in emails:
writer.writerow(email)
Output:
Магазин автозапчастей;grandauto007#mail.ru
Спектр-Авто;+79233940022#yandex.ru
Things I have changed in your code:
The input CSV header is now read into header - did you want to do anything with that?
The name is now set from items[0] for each row in the input CSV.
The emails list is now a list of (name, email) pairs.
Optimization detail: you don't need to turn reader into a list to iterate over it. Just say for items in reader:, it'll be more efficient since it will process each row as it reads it instead of storing them all into a list.
import petl
table = petl.fromcsv('data.csv', encoding='utf-8-sig')
table2 = petl.addfield(table, 'email_address', lambda r: [r[r1] for r1 in petl.header(table) if '#' in r[r1]])
table3 = petl.cut(table2, 'Наименование', 'email_address')
petl.tocsv(table3, 'result.csv', encoding='utf-8-sig', delimiter=';', write_header=True)
Load the CSV into a table
Create a new field(column) that is an aggregate of any field containing an email address
Reduce(cut) the table to only contain the 2 important fields: 'Наименование', 'email_address'
Output the results to a CSV
Output:
Наименование;email_address
Магазин автозапчастей;['grandauto007#mail.ru']
Спектр-Авто;['+79233940022#yandex.ru']
Be sure to install petl:
pip install petl
I stored data on a CSV file with Python. Now I need to read it with Python but there are some issues with it. There is a
";;;;;;"
statement on the finish of every line.
Here is the code that I used for writing data to CSV :
file = open("products.csv", "a")
writer = csv.writer(file, delimiter=",", quotechar='"', quoting=csv.QUOTE_ALL)
writer.writerow(data)
And I am trying to read that with that code :
with open("products.csv", "r", newline="") as in_file, open("3.csv", "w", newline='') as to_file:
reader = csv.reader(in_file, delimiter="," ,doublequote=True)
for row in reader:
print(row)
Of course, I am not reading it for just printing I need to remove duplicated lines and make it a readable CSV.
I've tried this to fetch strings and edit them and it's worked for other fields except for semicolons. I cant understand why I cant edit those semicolons.
for row in reader:
try:
print(row)
rowList = row[0].split(",")
for index, field in enumerate(rowList):
if '"' in field:
field = field.replace('"', "")
elif ";;;;;;" in rowList[index]:
field = field.replace(";;;;;;", "")
rowList[index] = field
print(rowList)
Here is the output of the code above :
['Product Name', 'Product Description', 'SKU', 'Regular Price', 'Sale Price', 'Images;;;;;;']
Can anybody help me?
I realized that I used 'elif' on there. I changed it and it solved. Thanks for the help. But I still don't know why it added that semicolon to there.
Looking to edit rows in my csv document and I keep getting flip-flopping issues between requiring "bytes-like object is required" and "Error: iterator should return strings, not bytes"
Running Python3
I have tried changing the mode from "rb" to "r" as well as placing generic string texts in the writer.writerow loop.
The CSV file is definitely comma separated, not tab separated.
I am following this youtube tutorial: https://www.youtube.com/watch?v=pOJ1KNTlpzE&t=75s (1:40)
temp_file = NamedTemporaryFile(delete=False)
with open('clientlist.csv','rb') as csvfile, temp_file:
reader = csv.DictReader(csvfile)
fieldnames = ['Account Name','Account Number','Date Last Checked']
writer = csv.DictWriter(temp_file, fieldnames=fieldnames)
writer.writeheader()
print(temp_file.name)
for row in reader:
writer.writerow({
'Account Name': row['Account Name'],
'Account Number': row['Account Number'],
'Date Last Checked': row['Date Last Checked'],
})
#shutil.move(temp_file.name, client_list)
Expected result should make it when I open the temp_file there is data. Then, from what I read, the shuthil should copy it. Right now the temp_file is blank.
Any ideas if it would be easier to start from scratch and use numpy or pandas? Saw this video on that: https://www.youtube.com/watch?v=pbjGo3oj0PM&list=PLulVrUACBIGX8JT7vpoHVQLYqgOKeunb6&index=16&t=0s
According to the NamedTemporaryFile documentation, named temporary files are opened in w+b mode by default - i.e. binary.
Since you are reading and writing csv files, it makes no sense (to me) to operate in binary mode, so rather open the input file in r mode, and ask for a temporary file in w mode:
import csv
import tempfile
temp_file = tempfile.NamedTemporaryFile(mode='w', delete=False) # note the mode argument
with open('clientlist.csv','r') as csvfile, temp_file: #note the mode argument
reader = csv.DictReader(csvfile)
fieldnames = ['Account Name','Account Number','Date Last Checked']
writer = csv.DictWriter(temp_file, fieldnames=fieldnames)
writer.writeheader()
for row in reader:
writer.writerow({
'Account Name': row['Account Name'],
'Account Number': row['Account Number'],
'Date Last Checked': row['Date Last Checked'],
})
That seems to behave for me.
Here they recommend defining the encoding:
Python 3.1.3 Win 7: csv writerow Error "must be bytes or buffer, not str"
Nevertheless, why don't you open the temporary file with open?
Like,
temp_file = open("new_file.csv", 'wb');
I am trying to process XML files one-by-one in the directory. Basically reading the values and then populate CSV file. I am having trouble parsing each XML one by one. Issue with my code is csvWriter.writerow only write the values from the last XML file in the directory. Even I have a loop for all items in root of ElementTree.parse(path). I want it to write each line for each XML file in the directory.
from lxml import etree as ElementTree
import csv
import os
import errno
import shutil
def writeData(item):
csvFile = open('D:\\metadata.csv', 'w', newline='')
csvWriter = csv.writer(csvFile, delimiter='|',
lineterminator='\n')
csvWriter.writerow([
'type',
'object',
'title',
'subject',
'domain',
'name',
'_name',
'version_label',
'creator_name',
'creation_date',
'modifier',
'modify_date',
'content_type',
'chronicle_id',
'antecedent_id',
'activity_date',
'search_from_date',
'number',
'service_code',
'initial_inspection_date',
'search_to_date',
'File Name',
'Location',
])
csvWriter.writerow([
root[0][0].text,
root[0][1].text,
root[0][2].text,
root[0][3].text,
root[0][4].text,
root[0][5].text,
root[0][6].text,
root[0][7].text,
root[0][8].text,
root[0][9].text,
root[0][10].text,
root[0][11].text,
root[0][12].text,
root[0][13].text,
root[0][14].text,
root[0][15].text,
root[0][16].text,
root[0][17].text,
root[0][18].text,
root[0][19].text,
root[0][20].text,
root[2].text,
root[1].text,
])
csvFile.close()
for file in os.listdir('D:\\temp\\Export\\test'):
if file.endswith('.xml'):
path = os.path.join('D:\\temp\\Export\\test', file)
tree = ElementTree.parse(path)
#print(tree)
root = tree.getroot()
#print(root)
for item in root:
print(item)
writeData(item)
The reason you only see the data of the last xml file, is that you keep overwriting the data in the final .csv file. Instead of reopening the .csv file for every write iteration, try opening it just once, and passing it to your writeData function like this:
def writeData(csv_writer, item):
csv_writer.writerow([
'type',
'object',
'title',
'subject',
'domain',
'name',
'_name',
'version_label',
'creator_name',
'creation_date',
'modifier',
'modify_date',
'content_type',
'chronicle_id',
'antecedent_id',
'activity_date',
'search_from_date',
'number',
'service_code',
'initial_inspection_date',
'search_to_date',
'File Name',
'Location',
])
csv_writer.writerow([
root[0][0].text,
root[0][1].text,
root[0][2].text,
root[0][3].text,
root[0][4].text,
root[0][5].text,
root[0][6].text,
root[0][7].text,
root[0][8].text,
root[0][9].text,
root[0][10].text,
root[0][11].text,
root[0][12].text,
root[0][13].text,
root[0][14].text,
root[0][15].text,
root[0][16].text,
root[0][17].text,
root[0][18].text,
root[0][19].text,
root[0][20].text,
root[2].text,
root[1].text,
])
with open('D:\\metadata.csv', 'w', newline='') as csv_file:
csv_writer = csv.writer(csv_file, delimiter='|', lineterminator='\n')
for file in os.listdir('D:\\temp\\Export\\test'):
if file.endswith('.xml'):
path = os.path.join('D:\\temp\\Export\\test', file)
tree = ElementTree.parse(path)
root = tree.getroot()
writeData(csv_writer, root)
When I run this code...
from simple_salesforce import Salesforce
sf = Salesforce(username='un', password='pw', security_token='tk')
cons = sf.query_all("SELECT Id, Name FROM Contact WHERE IsDeleted=false LIMIT 2")
import csv
with open('c:\test.csv', 'w') as csvfile:
fieldnames = ['contact_name__c', 'recordtypeid']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for con in cons['records']:
writer.writerow({'contact_name__c': con['Id'], 'recordtypeid': '082I8294817IWfiIWX'})
print('done')
I get the following output inside my CSV file...
contact_name__c,recordtypeid
xyzzyID1xyzzy,082I8294817IWfiIWX
abccbID2abccb,082I8294817IWfiIWX
I'm not sure why those extra lines are there.
Any tips for getting rid of them so my CSV file will be normal-looking?
I'm on Python 3.4.3 according to sys.version_info.
Here are a few more code-and-output pairs, to show the kind of data I'm working with:
from simple_salesforce import Salesforce
sf = Salesforce(username='un', password='pw', security_token='tk')
print(sf.query_all("SELECT Id, Name FROM Contact WHERE IsDeleted=false LIMIT 2"))
produces
OrderedDict([('totalSize', 2), ('done', True), ('records', [OrderedDict([('attributes', OrderedDict([('type', 'Contact'), ('url', '/services/data/v29.0/sobjects/Contact/xyzzyID1xyzzy')])), ('Id', 'xyzzyID1xyzzy'), ('Name', 'Person One')]), OrderedDict([('attributes', OrderedDict([('type', 'Contact'), ('url', '/services/data/v29.0/sobjects/Contact/abccbID2abccb')])), ('Id', 'abccbID2abccb'), ('Name', 'Person Two')])])])
and
from simple_salesforce import Salesforce
sf = Salesforce(username='un', password='pw', security_token='tk')
cons = sf.query_all("SELECT Id, Name FROM Contact WHERE IsDeleted=false LIMIT 2")
for con in cons['records']:
print(con['Id'])
produces
xyzzyID1xyzzy
abccbID2abccb
Two likely possibilities: the output file needs to be opened in binary mode and/or the writer needs to be told not to use DOS style line endings.
To open the file in binary mode in Python 3 replace your current with open line with:
with open('c:\test.csv', 'w', newline='') as csvfile:
to eliminate the DOS style line endings try:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames, lineterminator="\n")