I'm new to Bonobo library and built a simple flow :
read a simple CSV called input.csv with header : Header1, Header2, Header3, Header4
append a new column which is the concatenation of the others
write the result to a CSV file called output.csv
I'm using the built-in CsvReader and CsvWriter from bonobo to make it simple.
First I was stuck with the CsvReader not sending the headers with cells, and a suggested workaround was adding
#use_raw_input
annotation for the transformation coming right after the CsvReader. But when passing content to the next activity, the bag is once again losing its header and seen as a simple tuple. It does work only if and only if I explicitely name the fields
def process_rows(Header1, Header2, Header3, Header4)
My code is as per below (put a breakpoint in process_rows to see that you get a tuple without the header) :
import bonobo
from bonobo.config import use_raw_input
# region constants
INPUT_PATH = 'input.csv'
OUTPUT_PATH = 'output.csv'
EXPECTED_HEADER = ('Header1', 'Header2', 'Header3', 'Header4')
# endregion constants
#This is stupid because all rows are checked instead of only the first
#use_raw_input #mandatory to get the header
def validate_header(input):
if input._fields != EXPECTED_HEADER:
raise("This file has an unexpected header, won't be processed")
yield input
def process_rows(*input):
concat = ""
for elem in input:
concat += elem
result = input.__add__((concat,))
yield result
# region bonobo + main
def get_graph(**options):
graph = bonobo.Graph()
graph.add_chain(bonobo.CsvReader(INPUT_PATH, delimiter=','),
validate_header,
process_rows,
bonobo.CsvWriter(OUTPUT_PATH))
return graph
def get_services(**options):
return {}
if __name__ == '__main__':
parser = bonobo.get_argument_parser()
with bonobo.parse_args(parser) as options:
bonobo.run(
get_graph(**options),
services=get_services(**options)
)
# endregion bonobo + main
Thanks for your time and help !
I did some investigations and found this "FUTURE" document that I think is what you are after:
http://docs.bonobo-project.org/en/master/guide/future/transformations.html
But it is not implemented.
I found this similar question Why does Bonobo's CsvReader() method yield tuples and not dicts?
Related
My task is to find IfcQuantityArea values of all IfcWall in the project.ifc and export those values to .csv with another attributes such as GlobalId and Name.
The question is how I can "define" the result from def function, so I could set is as variable or list, so I could insert it into a column in my new .csv file?
I tried several ways, but as I print it, it looks fine, but I have no idea how to collect this values to my .csv file. Maybe there is another approach to count the IfcWall areas using some api functions? Any ideas both to python and ifcopenshell environment?
import ifcopenshell
def print_quantities(property_definition):
if 'IfcElementQuantity' == property_definition.is_a():
for quantity in property_definition.Quantities:
if 'IfcQuantityArea' == quantity.is_a():
print('Area value: ' + str(quantity.AreaValue))
model = ifcopenshell.open('D:/.../project-modified.ifc')
products = model.by_type('IfcWall')
for product in products:
if product.IsDefinedBy:
definitions = product.IsDefinedBy
for definition in definitions:
if 'IfcRelDefinesByProperties' == definition.is_a():
property_definition = definition.RelatingPropertyDefinition
print_quantities(property_definition)
if 'IfcRelDefinesByType' == definition.is_a():
type = definition.RelatingType
if type.HasPropertySets:
for property_definition in type.HasPropertySets:
print_quantities(property_definition)
import csv
header = ['GlobalId', 'Name', 'TotalArea']
data = []
for wall in model.by_type('IfcWall'):
row = [wall.GlobalId, wall.Name, AreaValue]
data.append(row)
with open('D:/.../quantities.csv', 'w', encoding='UTF8', newline='') as f:
writer = csv.writer(f)
writer.writerow(header)
writer.writerows(data)
I'm writing a to-do list application, and to store the class objects task I'm pickling a list of the objects created. However, when I load the data, the list appears empty. The way I structured it is to create an empty list each session, then append the contents of the pickle file. When new tasks are created, they are appended and the whole list is then appended and then reloaded.
This is my first real software project, so my code looks pretty rough. I reviewed it and can't find any glaring errors, but obviously I am doing something wrong.
Here is the relevant code:
import _pickle as pickle
import os.path
from os import path
from datetime import datetime
#checks if data exists, and creates file if it does not
if path.exists('./tasks.txt') != True:
open("./tasks.txt", 'wb')
else:
pass
#define class for tasks
class task:
def __init__(self, name, due, category):
self.name = name
self.due = datetime.strptime(due, '%B %d %Y %I:%M%p')
self.category = category
def expand(self): # returns the contents of the task
return str(self.name) + " is due in " + str((self.due - datetime.now()))
data = []
# load data to list
def load_data():
with open('tasks.txt', 'rb') as file:
while True:
data = []
try:
data.append(pickle.load(file))
except EOFError:
break
...
# returns current task list
def list_tasks():
clear()
if not data:
print("Nothing to see here.")
else:
i = 1
for task in data:
print("%s. %s" % (i, task.expand()))
i = i+1
#define function to add tasks
def addTask(name, due, category):
newTask = task(name, due, category)
data.append(newTask)
with open('./tasks.txt', 'wb') as file:
pickle.dump(data, file)
load_data()
list_tasks()
...
load_data()
list_tasks()
startup()
ask()
data = []
# load data to list
def load_data():
with open('tasks.txt', 'rb') as file:
while True:
data = []
try:
data.append(pickle.load(file))
except EOFError:
break
That second data = [] doesn't look right. Having data = [] both inside and outside of the function creates two data objects, and the one you're appending to won't be accessible anywhere else. And even if it was accessible, it would still be empty since it's being reset to [] in every iteration of the while loop. Try erasing the inner data = []. Then the data.append call will affect the globally visible data, and its contents won't be reset in each loop.
Additionally, going by the rest of your code it looks like that data is supposed to be a list of tasks. But if you pickle a list of tasks and then run data.append(pickle.load(file)), then data will be a list of lists of tasks instead. One way to keep things flat is to use extend instead of append.
data = []
# load data to list
def load_data():
with open('tasks.txt', 'rb') as file:
while True:
try:
data.extend(pickle.load(file))
except EOFError:
break
I think it may also be possible to load the data with a single load call, rather than many calls in a loop. It depends on whether your tasks.txt file is the result of a single pickle.dump call, or if you appended text to it multiple times with multiple pickle.dump calls while the file was opened in "append" mode.
def load_data():
with open('tasks.txt', 'rb') as file:
return pickle.load(file)
data = load_data()
Forewarning: I am very new to Python and programming in general. I am trying to use Python 3 to get some CSV data and making some changes to it before writing it to a file. My problem lies in accessing the CSV data from a variable, like so:
import csv
import requests
csvfile = session.get(url)
reader = csv.reader(csvfile.content)
for row in reader:
do(something)
This returns:
_csv.Error: iterator should return strings, not int (did you open the file in text mode?)
Googling revealed that I should be feeding the reader text instead of bytes, so I also attempted:
reader = csv.reader(csvfile.text)
This also does not work as the loop works through it letter by letter instead of line by line. I also experimented with TextIOWrapper and similar options with no success. The only way I have managed to get this to work is by writing the data to a file, reading it, and then making changes, like so:
csvfile = session.get(url)
with open("temp.txt", 'wb') as f:
f.write(csvfile.content)
with open("temp.txt", 'rU', encoding="utf8") as data:
reader = csv.reader(data)
for row in reader:
do(something)
I feel like this is far from the most optimal way of doing this, even if it works. What is the proper way to read and edit the CSV data directly from memory, without having to save it to a temporary file?
you don't have to write to a temp file, here is what I would do, using the "csv" and "requests" modules:
import csv
import requests
__csvfilepathname__ = r'c:\test\test.csv'
__url__ = 'https://server.domain.com/test.csv'
def csv_reader(filename, enc = 'utf_8'):
with open(filename, 'r', encoding = enc) as openfileobject:
reader = csv.reader(openfileobject)
for row in reader:
#do something
print(row)
return
def csv_from_url(url):
line = ''
datalist = []
s = requests.Session()
r = s.get(url)
for x in r.text.replace('\r',''):
if not x[0] == '\n':
line = line + str(x[0])
else:
datalist.append(line)
line = ''
datalist.append(line)
# at this point you already have a data list 'datalist'
# no need really to use the csv.reader object, but here goes:
reader = csv.reader(datalist)
for row in reader:
#do something
print(row)
return
def main():
csv_reader(__csvfilepathname__)
csv_from_url(__url__)
return
if __name__ == '__main__':
main ()
not very pretty, and probably not very good in regards to memory/performance, depending on how "big" your csv/data is
HTH, Edwin.
I have a csv file with several hundred organism IDs and a second csv file with several thousand organism IDs and additional characteristics (taxonomic information, abundances per sample, etc)
I am trying to write a code that will extract the information from the larger csv using the smaller csv file as a reference. Meaning it will look at both smaller and larger files, and if the IDs are in both files, it will extract all the information form the larger file and write that in a new file (basically write the entire row for that ID).
so far I have written the following, and while the code does not error out on me, I get a blank file in the end and I don't exactly know why. I am a graduate student that knows some simple coding but I'm still very much a novice,
thank you
import sys
import csv
import os.path
SparCCnames=open(sys.argv[1],"rU")
OTU_table=open(sys.argv[2],"rU")
new_file=open(sys.argv[3],"w")
Sparcc_OTUs=csv.writer(new_file)
d=csv.DictReader(SparCCnames)
ids=csv.DictReader(OTU_table)
for record in ids:
idstopull=record["OTUid"]
if idstopull[0]=="OTUid":
continue
if idstopull[0] in d:
new_id.writerow[idstopull[0]]
SparCCnames.close()
OTU_table.close()
new_file.close()
I'm not sure what you're trying to do in your code but you can try this:
def csv_to_dict(csv_file_path):
csv_file = open(csv_file_path, 'rb')
csv_file.seek(0)
sniffdialect = csv.Sniffer().sniff(csv_file.read(10000), delimiters='\t,;')
csv_file.seek(0)
dict_reader = csv.DictReader(csv_file, dialect=sniffdialect)
csv_file.seek(0)
dict_data = []
for record in dict_reader:
dict_data.append(record)
csv_file.close()
return dict_data
def dict_to_csv(csv_file_path, dict_data):
csv_file = open(csv_file_path, 'wb')
writer = csv.writer(csv_file, dialect='excel')
headers = dict_data[0].keys()
writer.writerow(headers)
# headers must be the same with dat.keys()
for dat in dict_data:
line = []
for field in headers:
line.append(dat[field])
writer.writerow(line)
csv_file.close()
if __name__ == "__main__":
big_csv = csv_to_dict('/path/to/big_csv_file.csv')
small_csv = csv_to_dict('/path/to/small_csv_file.csv')
output = []
for s in small_csv:
for b in big_csv:
if s['id'] == b['id']:
output.append(b)
if output:
dict_to_csv('/path/to/output.csv', output)
else:
print "Nothing."
Hope that will help.
You need to read the data into a data structure, assuming OTUid is unique you can store this into a dictionary for fast lookup:
with open(sys.argv[1],"rU") as SparCCnames:
d = csv.DictReader(SparCCnames)
fieldnames = d.fieldnames
data = {i['OTUid']: i for i in d}
with open(sys.argv[2],"rU") as OTU_table, open(sys.argv[3],"w") as new_file:
Sparcc_OTUs = csv.DictWriter(new_file, fieldnames)
ids = csv.DictReader(OTU_table)
for record in ids:
if record['OTUid'] in data:
Sparcc_OTUs.writerow(data[record['OTUid']])
Thank you everyone for your help. I played with things and consulted with an advisor, and finally got a working script. I am posting it in case it helps someone else in the future.
Thanks!
import sys
import csv
input_file = csv.DictReader(open(sys.argv[1], "rU")) #has all info
ref_list = csv.DictReader(open(sys.argv[2], "rU")) #reference list
output_file = csv.DictWriter(
open(sys.argv[3], "w"), input_file.fieldnames) #to write output file with headers
output_file.writeheader() #write headers in output file
white_list={} #create empty dictionary
for record in ref_list: #for every line in my reference list
white_list[record["Sample_ID"]] = None #store into the dictionary the ID's as keys
for record in input_file: #for every line in my input file
record_id = record["Sample_ID"] #store ID's into variable record_id
if (record_id in white_list): #if the ID is in the reference list
output_file.writerow(record) #write the entire row into a new file
else: #if it is not in my reference list
continue #ignore it and continue iterating through the file
Can someone help me figure out what I'm doing wrong?
I'm writing a python shell script that takes an ldif file and a csv file and then appends the contents in the csv file to the end of each record in the ldif. Something like:
Sample CSV:
"KEY","VALUE"
"abc","def"
"foo","bar"
"qwop","flop"
Sample .ldif:
dn: Aziz
cn: Aziz_09
dn: Carl
cn: Carl_04
After python myscript.py "sample.ldif" "sample.csv"
dn: Aziz
cn: Aziz_09
KEY: VALUE
abc: def
foo: bar
qwop: flop
dn: Carl
cn: Carl_04
KEY: VALUE
abc: def
foo: bar
qwop: flop
So far my code compiles however it doesn't modify the file correctly. I'm creating an object that takes a csv file path name string on creation and then stores the keys into a list field and stores the values into a list field. I then open the ldif file, parse for the escape characters between records and insert the list fields (KEY and VALUE) at the end of each record:
import sys, csv
# Make new object that can open a csv and set csv data in its arrays
class Container(object):
def __init__(self, filename=None, keys=None, values=None):
self.filename = filename
self.keys = []
self.values = []
# Opens self.filename and puts 0th and 1st rows into keys and values respectively
def csv_to_list():
with open(self.filename, 'rb') as f:
reader = csv.reader(f)
for row in reader:
self.keys = row[0]
self.values = row[1]
haruhi = Container("./content/test_pairs.txt")
haruhi.csv_to_list
# open first argument of the command line call to ldif_record_a.py for read/writing
with open(sys.argv[1],'r+') as f1:
lines=[x.strip() for x in f1] # Create list with each line as an element
f1.truncate(0)
f1.seek(0)
count = 0
for x in lines:
if x:
f1.write(x+'\n')
else:
f1.write("{0}: {1}\n\n".format(haruhi.keys[count] , haruhi.values[count]))
count = count + 1
f1.write("{0}: {1}\n\n".format(haruhi.keys[count] , haruhi.values[count]))
I am new to Python! Any help, advice and/or resource direction would be greatly appreciated! Thank you SO
Okay, I adhoc'd this, so it needs work, but here goes:
import csv
import re
csv_data = list(csv.reader(open('/home/jon/tmp/data.csv'))) # (1)
csv_text = '\n' + '\n'.join('{0} : {1}'.format(*row) for row in csv_data) # (2)
with open('/home/jon/tmp/other.ldif') as f:
contents = f.read() # (3)
print re.sub(r'(\n\n)|(\n$)', csv_text + '\n\n', contents) # (4)
(1) Read the CSV file into a list of lists
csv_data == [['KEY', 'VALUE'], ['abc', 'def'], ['foo', 'bar'], ['qwop', 'flop']]
(2) Create a text representation to be append to each ldif
KEY : VALUE
abc : def
foo : bar
qwop : flop
(3) Open and read the entire contents into memory (not very efficient mind you)
(4) Use a regular expression to find the "next bit" after the ldif and put in text
Prints:
dn: Aziz
cn: Aziz_09
KEY : VALUE
abc : def
foo : bar
qwop : flop
dn: Carl
cn: Carl_04
KEY : VALUE
abc : def
foo : bar
qwop : flop
You'll need to adjust it to write data back out or whatever you want..., but is a possible starting point - but strongly recommend you use it a base to work through accompanied by the Python manual. Feel free to ask for any clarification.