Process an input Excel file and serialize its rows - python

I have to solve an exercise, but I can't solve one error.
I don't have so much knowledge about Boo language, sorry.
My code is:
public class Item (IIDataReaderLoadable):
Sequence as long
Code as string
Description as string
Weight as decimal
Id as Guid
def LoadFromReader(reader as IDataReader):
Sequence = long.Parse(reader[0].ToString());
Code = reader[1].ToString();
Weight = decimal.Parse(reader[2].ToString());
Description = reader[3].ToString();
Id = Guid.Parse(reader[4].ToString());
TableName as string:
get:
return "Hoja1$"
operation read_MasterData_etlexcel:
log = ProcessContext.GetLogger()
file = ProcessContext.InputFile
log.Info("Reading $file")
for Data in EntityReader[of Item].Read(file):
yield Row.FromObject(Data)
operation print_etlexcel:
log = ProcessContext.GetLogger()
for row in rows:
log.Info(row.ToString())
yield row
def serialize_row(it as Object, id as Guid):
serializer = XmlSerializer(typeof(Item))
writer = FileStream("output" + id.ToString() +".xml", FileMode.Create);
serializer.Serialize(writer, it);
writer.Close();
serialize_row(Item, Item.Id)
process process_owners_etlexcel:
read_MasterData_etlexcel()
print_etlexcel()
When I execute it in a command window I get the next error:
2018-05-14 14:18:44.0479 [Error] [Mss.Etl.DSLLoader.EtlSetup] Cannot execute ./e
xcelfile/import.boo BCE0000: C:\Program Files\Mecalux\GnaService2015\excelfile\i
mport.boo(57,30): BCE0020: Boo.Lang.Compiler.CompilerError: An instance of type
'Mss.Item' is required to access non static member 'Id'.
I want to read an Excel file that contains some columns and I have to créate a boo script that recover the content of my Excel file, then I have to map each row from the Excel file into an object of you my class Ítem, and serialize the object in a XML file
Thanks

The bug is on this line:
serialize_row(Item, Item.Id)
The Item.Id field is a member field not a static field, which means you need an instance. It looks like you're calling it there as as a static so its blowing up. I'm not sure what the solution is because you have a couple of macros there that are not defined in the code example so I'm not sure what they're doing but I would think you either need to remove that line or pass in a member id or a random one.
I would have to guess this is the solution:
item = Item()
serialize_row(item, item.Id)

Related

web2py: How to execute instructions before delete using SQLFORM.smartgrid

I use SQLFORM.smartgrid to show a list of records from a table (service_types). In each row of the smartgrid there is a delete link/button to delete the record. I want to executive some code before smartgrid/web2py actually deletes the record, for example I want to know if there are child records (services table) referencing this record, and if any, flash a message telling user that record cannot be deleted. How is this done?
db.py
db.define_table('service_types',
Field('type_name', requires=[IS_NOT_EMPTY(), IS_ALPHANUMERIC()]),
format='%(type_name)s',
)
db.define_table('services',
Field('service_name',requires=[IS_NOT_EMPTY(),IS_NOT_IN_DB(db,'services.service_name')]),
Field('service_type','reference service_types',requires=IS_IN_DB(db,db.service_types.id,
'%(type_name)s',
error_message='not in table',
zero=None),
ondelete='RESTRICT',
),
Field('interest_rate','decimal(15,2)',requires=IS_DECIMAL_IN_RANGE(0,100)),
Field('max_term','integer'),
auth.signature,
format='%(service_name)s',
)
db.services._plural='Services'
db.services._singular='Service'
if db(db.service_types).count() < 1:
db.service_types.insert(type_name='Loan')
db.service_types.insert(type_name='Contribution')
db.service_types.insert(type_name='Other')
controller
def list_services():
grid = SQLFORM.smartgrid(db.services
, fields = [db.services.service_name,db.services.service_type]
)
return locals()
view
{{extend 'layout.html'}}
{{=grid}}
There are two options. First, the deletable argument can be a function that takes the Row object of a given record and returns True or False to indicate whether the record is deletable. If it returns False, the "Delete" button will not be shown for that record, nor the delete operation be allowed on the server.
def can_delete(row):
return True if [some condition involving row] else False
grid = SQLFORM.smartgrid(..., deletable=can_delete)
Second, there is an ondelete argument that takes the db Table object and the record ID. It is called right before the delete operation, so to prevent the delete, you can do a redirect within that function:
def ondelete(table, record_id):
record = table(record_id)
if [some condition]:
session.flash = 'Cannot delete this record'
redirect(URL())
grid = SQLFORM.smartgrid(..., ondelete=ondelete)
Note, if the grid is loaded via an Ajax component and its actions are therefore performed via Ajax, using redirect within the ondelete method as shown above will not work well, as the redirect will have no effect and the table row will still be deleted from the grid in the browser (even though the database record was not deleted). In that case, an alternative approach is to return a non-200 HTTP response to the browser, which will prevent the client-side Javascript from deleting the row from the table (the delete happens only on success of the Ajax request). We should also set response.flash instead of session.flash (because we are not redirecting/reloading the whole page):
def ondelete(table, record_id):
record = table(record_id)
if [some condition]:
response.flash = 'Cannot delete this record'
raise HTTP(403)
Note, both the deletable and ondelete arguments can be dictionaries with table names as keys, so you can specify different values for different tables that might be linked from the smartgrid.
Finally, notice the delete URLs look like /appname/list_services/services/delete/services/[record ID]. So, in the controller, you can determine if a delete is being requested by checking if 'delete' in request.args. In that case, request.args[-2:] represents the table name and record ID, which you can use to do any checks.
From Anthony's answer I chose the second option and came up with the following:
def ondelete_service_type(service_type_table, service_type_id):
count = db(db.services.service_type == service_type_id).count()
if count > 0:
session.flash = T("Cant delete")
#redirect(URL('default','list_service_types#'))
else:
pass
return locals()
def list_service_types():
grid = SQLFORM.smartgrid(db.service_types
, fields = [db.service_types.type_name, db.services.service_name]
, ondelete = ondelete_service_type
)
return locals()
But, if I do this...
if count > 0:
session.flash = T("Cant delete")
else:
pass
return locals()
I get this error:
And if I do this:
if count > 0:
session.flash = T("Cant delete")
redirect(URL('default','list_service_types#')) <== please take note
else:
pass
return locals()
I get the flash error message Cant delete but the record appears deleted from the list, and reappears after a page refresh with F5 (apparently because the delete was not allowed in the database, which is intended).
Which one should I fix and how?
Note
If any of these issue is resolved I can accept Anthony's answer.

Using python class with spark DataFrame to parse URL's

I'm trying to process URL's in a pyspark dataframe using a class that I've written and a udf. I'm aware of urllib and other url parsing libraries but for this case I need to use my own code.
In order to get the tld of a url I cross check it against the iana public suffix list.
Here's a simplification of my code
class Parser:
# list of available public suffixes for extracting top level domains
file = open("public_suffix_list.txt", 'r')
data = []
for line in file:
if line.startswith("//") or line == '\n':
pass
else:
data.append(line.strip('\n'))
def __init__(self, url):
self.url = url
#the code here extracts port,protocol,query etc.
#I think this bit below is causing the error
matches = [r for r in self.data if r in self.hostname]
#extra functionality in my actual class
i = matches.index(self.string)
try:
self.tld = matches[i]
# logic to find tld if no match
The class works in pure python so for example I can run
import Parser
x = Parser("www.google.com")
x.tld #returns ".com"
However when I try to do
import Parser
from pyspark.sql.functions import udf
parse = udf(lambda x: Parser(x).url)
df = sqlContext.table("tablename").select(parse("column"))
When I call an action I get
File "<stdin>", line 3, in <lambda>
File "<stdin>", line 27, in __init__
TypeError: 'in <string>' requires string as left operand
So my guess is that it's failing to interpret the data as a list of strings?
I've also tried to use
file = sc.textFile("my_file.txt")\
.filter(lambda x: not x.startswith("//") or != "")\
.collect()
data = sc.broadcast(file)
to open my file instead, but that causes
Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
Any ideas?
Thanks in advance
EDIT: Apologies, I didn't have my code to hand so my test code didn't explain very well the problems I was having. The error I initially reported was a result of the test data I was using.
I've updated my question to be more reflective of the challenge I'm facing.
Why do you need a class in this case (the code for defining your class is incorrect, you never declared self.data before using it in the init method) the only relevant line that affects the output you want is self.string=string, so you are basically passing the identity function as udf.
The UnicodeDecodeError is due to an encoding issue in your file, it has nothing to do with your definition of the class.
The second error is in the line sc.broadcast(file) , details of which can be found here : Spark: Broadcast variables: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion
EDIT 1
I would redefine your class structure as follows. You basically need to create the instance self.data by calling self.data = data before you can use it. Also anything that you write before the init method is executed irrespective of whether you call that class or not. So moving out the file parsing part will not have any effect.
# list of available public suffixes for extracting top level domains
file = open("public_suffix_list.txt", 'r')
data = []
for line in file:
if line.startswith("//") or line == '\n':
pass
else:
data.append(line.strip('\n'))
class Parser:
def __init__(self, url):
self.url = url
self.data = data
#the code here extracts port,protocol,query etc.
#I think this bit below is causing the error
matches = [r for r in self.data if r in self.hostname]
#extra functionality in my actual class
i = matches.index(self.string)
try:
self.tld = matches[i]
# logic to find tld if no match

Variable Route Not Working in For Loop

I tried to create multiple routes in one go by using the variables from the database and a for loop.
I tried this
temp = "example"
#app.route("/speaker/<temp>")
def getSpeakerAtr(temp):
return '''%s''' % temp
It works very well. BUT:
for x in models.Speaker.objects:
temp = str(x.name)
#app.route("/speaker/<temp>")
def getSpeakerAtr(temp):
return '''%s''' % temp
Doesn't work. The error message:
File "/Users/yang/Documents/CCPC-Website/venv/lib/python2.7/site-packages/flask/app.py", line 1013, in decorator
02:03:04 web.1 | self.add_url_rule(rule, endpoint, f, **options)
**The reason I want to use multiple routes is that I need to get the full data of an object by querying from the route. For example:
if we type this url:
//.../speaker/sam
we can get the object who has the 'name' value as 'sam'. Then I can use all of the values in this object like bio or something.**
You don't need multiple routes. Just one route that validates its value, eg:
#app.route('/speaker/<temp>')
def getSpeakerAtr(temp):
if not any(temp == str(x.name) for x in models.Speaker.objects):
# do something appropriate (404 or something?)
# carry on doing something else
Or as to your real intent:
#app.route('/speaker/<name>')
def getSpeakerAtr(name):
speaker = # do something with models.Speaker.objects to lookup `name`
if not speaker: # or whatever check is suitable to determine name didn't exist
# raise a 404, or whatever's suitable
# we have a speaker object, so use as appropriate

How to convert one content type to another using Archetypes

I have a content type which which is a non-folderish content type, and I want to convert this content type into a folderish. Inside my content type there is a multifilefield. I read this link http://developer.plone.org/content/archetypes/converting-content-types.html about converting one content type to another. However, as I run it using browser view, the new content types were created, also the values of the old contents were copied except for the uploaded files handled by multifilefield, they are empty.
Here's my code inside browser view:
Updated:
def migrateaction(self):
items=self.context.listFolderContents(contentFilter={"portal_type": 'myoldcontent'})
for item in items:
id = "%s-new" % item.getId()
service = self.context.invokeFactory(
'mynewcontent',
id,
rp_category=item.getRp_category(),
familyname=item.getFamilyname(),
firstname=item.getFirstname(),
file=item.getField('file').getRaw(item))
return 'Successfully migrated.'
My field definition for multifilefield:
MultiFileField('file',
primary=True,
languageIndependent=True,
widget = MultiFileWidget(
label= "File Uploads",
show_content_type = False,)),
All field definitions are the same for both old and new content types.
Is there lacking in my code that caused the files not to be copied?

how to import csv data into django models

I have some CSV data and I want to import into django models using the example CSV data:
1;"02-01-101101";"Worm Gear HRF 50";"Ratio 1 : 10";"input shaft, output shaft, direction A, color dark green";
2;"02-01-101102";"Worm Gear HRF 50";"Ratio 1 : 20";"input shaft, output shaft, direction A, color dark green";
3;"02-01-101103";"Worm Gear HRF 50";"Ratio 1 : 30";"input shaft, output shaft, direction A, color dark green";
4;"02-01-101104";"Worm Gear HRF 50";"Ratio 1 : 40";"input shaft, output shaft, direction A, color dark green";
5;"02-01-101105";"Worm Gear HRF 50";"Ratio 1 : 50";"input shaft, output shaft, direction A, color dark green";
I have some django models named Product. In Product there are some fields like name, description and price. I want something like this:
product=Product()
product.name = "Worm Gear HRF 70(02-01-101116)"
product.description = "input shaft, output shaft, direction A, color dark green"
product.price = 100
You want to use the csv module that is part of the python language and you should use Django's get_or_create method
with open(path) as f:
reader = csv.reader(f)
for row in reader:
_, created = Teacher.objects.get_or_create(
first_name=row[0],
last_name=row[1],
middle_name=row[2],
)
# creates a tuple of the new object or
# current object and a boolean of if it was created
In my example the model teacher has three attributes first_name, last_name and middle_name.
Django documentation of get_or_create method
If you want to use a library, a quick google search for csv and django reveals two libraries - django-csvimport and django-adaptors. Let's read what they have to say about themselves...
django-adaptors:
Django adaptor is a tool which allow you to transform easily a CSV/XML
file into a python object or a django model instance.
django-importcsv:
django-csvimport is a generic importer tool to allow the upload of CSV
files for populating data.
The first requires you to write a model to match the csv file, while the second is more of a command-line importer, which is a huge difference in the way you work with them, and each is good for a different type of project.
So which one to use? That depends on which of those will be better suited for your project in the long run.
However, you can also avoid a library altogether, by writing your own django script to import your csv file, something along the lines of (warning, pseudo-code ahead):
# open file & create csvreader
import csv, yada yada yada
# import the relevant model
from myproject.models import Foo
#loop:
for line in csv file:
line = parse line to a list
# add some custom validation\parsing for some of the fields
foo = Foo(fieldname1=line[1], fieldname2=line[2] ... etc. )
try:
foo.save()
except:
# if the're a problem anywhere, you wanna know about it
print "there was a problem with line", i
It's super easy. Hell, you can do it interactively through the django shell if it's a one-time import. Just - figure out what you want to do with your project, how many files do you need to handle and then - if you decide to use a library, try figuring out which one better suits your needs.
Use the Pandas library to create a dataframe of the csv data.
Name the fields either by including them in the csv file's first line or in code by using the dataframe's columns method.
Then create a list of model instances.
Finally use the django method .bulk_create() to send your list of model instances to the database table.
The read_csv function in pandas is great for reading csv files and gives you lots of parameters to skip lines, omit fields, etc.
import pandas as pd
from app.models import Product
tmp_data=pd.read_csv('file.csv',sep=';')
#ensure fields are named~ID,Product_ID,Name,Ratio,Description
#concatenate name and Product_id to make a new field a la Dr.Dee's answer
products = [
Product(
name = tmp_data.ix[row]['Name'],
description = tmp_data.ix[row]['Description'],
price = tmp_data.ix[row]['price'],
)
for row in tmp_data['ID']
]
Product.objects.bulk_create(products)
I was using the answer by mmrs151 but saving each row (instance) was very slow and any fields containing the delimiting character (even inside of quotes) were not handled by the open() -- line.split(';') method.
Pandas has so many useful caveats, it is worth getting to know
You can also use, django-adaptors
>>> from adaptor.model import CsvModel
>>> class MyCSvModel(CsvModel):
... name = CharField()
... age = IntegerField()
... length = FloatField()
...
... class Meta:
... delimiter = ";"
You declare a MyCsvModel which will match to a CSV file like this:
Anthony;27;1.75
To import the file or any iterable object, just do:
>>> my_csv_list = MyCsvModel.import_data(data = open("my_csv_file_name.csv"))
>>> first_line = my_csv_list[0]
>>> first_line.age
27
Without an explicit declaration, data and columns are matched in the same order:
Anthony --> Column 0 --> Field 0 --> name
27 --> Column 1 --> Field 1 --> age
1.75 --> Column 2 --> Field 2 --> length
For django 1.8 that im using,
I made a command that you can create objects dynamically in the future,
so you can just put the file path of the csv, the model name and the app name of the relevant django application, and it will populate the relevant model without specified the field names.
so if we take for example the next csv:
field1,field2,field3
value1,value2,value3
value11,value22,value33
it will create the objects
[{field1:value1,field2:value2,field3:value3}, {field1:value11,field2:value22,field3:value33}]
for the model name you will enter to the command.
the command code:
from django.core.management.base import BaseCommand
from django.db.models.loading import get_model
import csv
class Command(BaseCommand):
help = 'Creating model objects according the file path specified'
def add_arguments(self, parser):
parser.add_argument('--path', type=str, help="file path")
parser.add_argument('--model_name', type=str, help="model name")
parser.add_argument('--app_name', type=str, help="django app name that the model is connected to")
def handle(self, *args, **options):
file_path = options['path']
_model = get_model(options['app_name'], options['model_name'])
with open(file_path, 'rb') as csv_file:
reader = csv.reader(csv_file, delimiter=',', quotechar='|')
header = reader.next()
for row in reader:
_object_dict = {key: value for key, value in zip(header, row)}
_model.objects.create(**_object_dict)
note that maybe in later versions
from django.db.models.loading import get_model
is deprecated and need to be change to
from django.apps.apps import get_model
The Python csv library can do your parsing and your code can translate them into Products().
something like this:
f = open('data.txt', 'r')
for line in f:
line = line.split(';')
product = Product()
product.name = line[2] + '(' + line[1] + ')'
product.description = line[4]
product.price = '' #data is missing from file
product.save()
f.close()
Write command in Django app. Where you need to provide a CSV file and loop it and create a model with every new row.
your_app_folder/management/commands/ProcessCsv.py
from django.core.management.base import BaseCommand
from django.conf import settings
from your_app_name.models import Product
class Command(BaseCommand):
def handle(self, *args, **options):
with open(os.join.path(settings.BASE_DIR / 'your_csv_file.csv'), 'r') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=';')
for row in csv_reader:
Product.objects.create(name=row[2], description=row[3], price=row[4])
At the end just run the command to process your CSV file and insert it into Product model.
Terminal:
python manage.py ProcessCsv
Thats it.
If you're working with new versions of Django (>10) and don't want to spend time writing the model definition. you can use the ogrinspect tool.
This will create a code definition for the model .
python manage.py ogrinspect [/path/to/thecsv] Product
The output will be the class (model) definition. In this case the model will be called Product.
You need to copy this code into your models.py file.
Afterwards you need to migrate (in the shell) the new Product table with:
python manage.py makemigrations
python manage.py migrate
More information here:
https://docs.djangoproject.com/en/1.11/ref/contrib/gis/tutorial/
Do note that the example has been done for ESRI Shapefiles but it works pretty good with standard CSV files as well.
For ingesting your data (in CSV format) you can use pandas.
import pandas as pd
your_dataframe = pd.read_csv(path_to_csv)
# Make a row iterator (this will go row by row)
iter_data = your_dataframe.iterrows()
Now, every row needs to be transformed into a dictionary and use this dict for instantiating your model (in this case, Product())
# python 2.x
map(lambda (i,data) : Product.objects.create(**dict(data)),iter_data
Done, check your database now.
You can use the django-csv-importer package.
http://pypi.python.org/pypi/django-csv-importer/0.1.1
It works like a django model
MyCsvModel(CsvModel):
field1 = IntegerField()
field2 = CharField()
etc
class Meta:
delimiter = ";"
dbModel = Product
And you just have to:
CsvModel.import_from_file("my file")
That will automatically create your products.
You can give a try to django-import-export. It has nice admin integration, changes preview, can create, update, delete objects.
This is based off of Erik's answer from earlier, but I've found it easiest to read in the .csv file using pandas and then create a new instance of the class for every row in the in data frame.
This example is updated using iloc as pandas no longer uses ix in the most recent version. I don't know about Erik's situation but you need to create the list outside of the for loop otherwise it will not append to your array but simply overwrite it.
import pandas as pd
df = pd.read_csv('path_to_file', sep='delimiter')
products = []
for i in range(len(df)):
products.append(
Product(
name=df.iloc[i][0]
description=df.iloc[i][1]
price=df.iloc[i][2]
)
)
Product.objects.bulk_create(products)
This is just breaking the DataFrame into an array of rows and then selecting each column out of that array off the zero index. (i.e. name is the first column, description the second, etc.)
Hope that helps.
Here's a django egg for it:
django-csvimport
Consider using Django's built-in deserializers. Django's docs are well-written and can help you get started. Consider converting your data from csv to XML or JSON and using a deserializer to import the data. If you're doing this from the command line (rather than through a web request), the loaddata manage.py command will be especially helpful.
define class in models.py and a function in it.
class all_products(models.Model):
def get_all_products():
items = []
with open('EXACT FILE PATH OF YOUR CSV FILE','r') as fp:
# You can also put the relative path of csv file
# with respect to the manage.py file
reader1 = csv.reader(fp, delimiter=';')
for value in reader1:
items.append(value)
return items
You can access ith element in the list as items[i]

Categories

Resources