How to fill PDF forms using Python - python

I have a PDF form created using Adobe LiveCycle Designer ES 10.4. I need to fill it using Python so that we can reduce manual labor. I searched the web and read some article most of them were focused around pdfrw library, I tried using it and extracted some information from PDF form as shown below
Code
from pdfrw import PdfReader
pdf = PdfReader('sample.pdf')
print(pdf.keys())
print(pdf.Info)
print(pdf.Root.keys())
print('PDF has {} pages'.format(len(pdf.pages)))
Output
['/Root', '/Info', '/ID', '/Size']
{'/CreationDate': "(D:20180822164509+05'30')", '/Creator': '(Adobe LiveCycle Designer ES 10.4)', '/ModDate': "(D:20180822165611+05'30')", '/Producer': '(Adobe XML Form Module Library)'}
['/AcroForm', '/MarkInfo', '/Metadata', '/Names', '/NeedsRendering', '/Pages', '/Perms', '/StructTreeRoot', '/Type']
PDF has 1 pages
I am not sure how further I can use pdfrw to access the fillable fields from the PDF form and fill them using Python is it possible. Any suggestions would be helpful.

You can find the form fields here:
pdf.Root.AcroForm.Fields
or here
pdf.Root.Pages.Kids[page_index].Annots
This is a PdfArray object. Basically a List.
The Name of the field is found here:
pdf.Root.AcroForm.Fields[field_index].T
Other keys include the value .V
There's a bunch of display information, like the font etc under .AP.N.Resources
However, if you update the value for a field and output the pdf file. It might only display the value when the field has focus i.e is clicked on.
I haven't figured out how to fix that yet.

I wrote a library built upon:'pdfrw', 'pdf2image', 'Pillow', 'PyPDF2' called fillpdf (pip install fillpdf and poppler dependency conda install -c conda-forge poppler)
Basic usage:
from fillpdf import fillpdfs
fillpdfs.get_form_fields("blank.pdf")
# returns a dictionary of fields
# Set the returned dictionary values a save to a variable
# For radio boxes ('Off' = not filled, 'Yes' = filled)
data_dict = {
'Text2': 'Name',
'Text4': 'LastName',
'box': 'Yes',
}
fillpdfs.write_fillable_pdf('blank.pdf', 'new.pdf', data_dict)
# If you want it flattened:
fillpdfs.flatten_pdf('new.pdf', 'newflat.pdf')
More info here:
https://github.com/t-houssian/fillpdf
If some fields don't fill, you can use fitz (pip install PyMuPDF) and PyPDF2 (pip install PyPDF2) like the following altering the points as needed:
import fitz
from PyPDF2 import PdfFileReader
file_handle = fitz.open('blank.pdf')
pdf = PdfFileReader(open('blank.pdf','rb'))
box = pdf.getPage(0).mediaBox
w = box.getWidth()
h = box.getHeight()
# For images
image_rectangle = fitz.Rect((w/2)-200,h-255,(w/2)-100,h-118)
pages = pdf.getNumPages() - 1
last_page = file_handle[pages]
last_page._wrapContents()
last_page.insertImage(image_rectangle, filename=f'image.png')
# For text
last_page.insertText(fitz.Point((w/2)-247 , h-478), 'John Smith', fontsize=14, fontname="times-bold")
file_handle.save(f'newpdf.pdf')

Use this to fill every fields if they are indexed.
template = PdfReader('template.pdf')
page_c = 0
while page_c < len(template.Root.Pages.Kids): #LOOP through pages
annot_c = 0
while annot_c < len(template.Root.Pages.Kids[page_c].Annots): #LOOP through fields
template.Root.Pages.Kids[page_c].Annots[annot_c].update(PdfDict(V=str(annot_c)+'-'+str(page_c)))
annot_c=annot_c+1
page_c=page_c+1
PdfWriter().write('output.pdf', template)

AcroForm based Forms using PDFix SDK
def SetFormFieldValue(email, key, open_path, save_path):
pdfix = GetPdfix()
if pdfix is None:
raise Exception('Pdfix Initialization fail')
if not pdfix.Authorize(pdfix_email, pdfix_license):
raise Exception('Authorization fail : ' + pdfix.GetError())
doc = pdfix.OpenDoc(open_path, "")
if doc is None:
raise Exception('Unable to open pdf : ' + pdfix.GetError())
field = doc.GetFormFieldByName("Text1")
if field is not None:
value = field.GetValue()
value = "New Value"
field.SetValue(value)
if not doc.Save(save_path, kSaveFull):
raise Exception(pdfix.GetError())
doc.Close()
pdfix.Destroy()

A full solution was provided here: How to edit editable pdf using the pdfrw library?
The key part is the:
template_pdf.Root.AcroForm.update(pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject('true')))

Related

Merge jpg links into pdf python

I'm trying to use kissmanga api to add mangas to my website.
This is the code:
from kissmanga import get_search_results, get_manga_details, get_manga_episode, get_manga_chapter
manga_search = get_search_results(query="Attack on titan")
for k in manga_search:
titleManga=(k.get('title' ))
for k in manga_search:
IdManga=(k.get('mangaid' ))
for k in manga_search:
print(titleManga)
manga_chapter = get_manga_chapter(mangaid=IdManga, chapNumber=1)
print(manga_chapter)
However, when I print manga_chapter I get:
{'totalPages': "['https://cdn.mangaclash.com/manga_5f3c9f1374eb8/237848eb4cd4b762b981f4e863a3edf9/1.jpg', 'https://cdn.mangaclash.com/manga_5f3c9f1374eb8/237848eb4cd4b762b981f4e863a3edf9/2.jpg', 'https://cdn.mangaclash.com/manga_5f3c9f1374eb8/237848eb4cd4b762b981f4e863a3edf9/3.jpg', 'https://cdn.mangaclash.com/manga_5f3c9f1374eb8/237848eb4cd4b762b981f4e863a3edf9/4.jpg', ']"}
How would I go about combining those jpgs into a pdf that a user can later download from my site? (so sends pdf to user and then deletes it to save space)?
I tried separating them individually with json but no luck, still kinda newish to python.

unotools insert image into document (libreoffice)

I'm trying to insert an image into a libreoffice document that is handled/controlled by unotools.
Therefore I start LibreOffice with this command:
soffice --accept='socket,host=localhost,port=8100;urp;StarOffice.Service'
Inside my python code I can connect to LibreOffice:
from unotools import Socket, connect
from unotools.component.writer import Writer
context = connect(Socket('localhost', 8100))
writer = Writer(context)
(This code is taken from this documentation: https://pypi.org/project/unotools/)
By using writer.set_string_to_end() I can add some text to the document. But I also want to insert an image into the document. So far I couldn't find any resource where this was done. The image is inside of my clipboard, so ideally I want to insert the image directly from there. Alternatively I can save the image temporarily and insert the saved file.
Is there any known way how to insert images by using unotools? Any alternative solution would also be great.
I've found a way to insert images by using uno instead of unotools:
import uno
from com.sun.star.awt import Size
from pythonscript import ScriptContext
def connect_to_office():
if not 'XSCRIPTCONTEXT' in globals():
localContext = uno.getComponentContext()
resolver = localContext.ServiceManager.createInstanceWithContext(
'com.sun.star.bridge.UnoUrlResolver', localContext )
client = resolver.resolve("uno:socket,host=localhost,port=8100;urp;StarOffice.ComponentContext" )
global XSCRIPTCONTEXT
XSCRIPTCONTEXT = ScriptContext(client, None, None)
def insert_image(doc):
size = Size()
path = uno.systemPathToFileUrl('/somepath/image.png')
draw_page = self.doc.DrawPage
image = doc.createInstance( 'com.sun.star.drawing.GraphicObjectShape')
image.GraphicURL = path
draw_page.add(image)
size.Width = 7500
size.Height = 5000
image.setSize(size)
image.setPropertyValue('AnchorType', 'AT_FRAME')
connect_to_office()
doc = XSCRIPTCONTEXT.getDocument()
insert_image(doc)
sources:
https://ask.libreoffice.org/en/question/38844/how-do-i-run-python-macro-from-the-command-line/
https://forum.openoffice.org/en/forum/viewtopic.php?f=45&t=80302
I still don't know how to insert an image from my clipboard, I worked around that problem by saving the image first. If someone knows a way to insert the image directly from the clipboard that would still be helpful.

How can I download a specific part of Coco Dataset?

I am developing an object detection model to detect ships using YOLO. I want to use the COCO dataset. Is there a way to download only the images that have ships with the annotations?
To download images from a specific category, you can use the COCO API. Here's a demo notebook going through this and other usages. The overall process is as follows:
Install pycocotools
Download one of the annotations jsons from the COCO dataset
Now here's an example on how we could download a subset of the images containing a person and saving it in a local file:
from pycocotools.coco import COCO
import requests
# instantiate COCO specifying the annotations json path
coco = COCO('...path_to_annotations/instances_train2014.json')
# Specify a list of category names of interest
catIds = coco.getCatIds(catNms=['person'])
# Get the corresponding image ids and images using loadImgs
imgIds = coco.getImgIds(catIds=catIds)
images = coco.loadImgs(imgIds)
Which returns a list of dictionaries with basic information on the images and its url. We can now use requests to GET the images and write them into a local folder:
# Save the images into a local folder
for im in images:
img_data = requests.get(im['coco_url']).content
with open('...path_saved_ims/coco_person/' + im['file_name'], 'wb') as handler:
handler.write(img_data)
Note that this will save all images from the specified category. So you might want to slice the images list to the first n.
From what I personally know, if you're talking about the COCO dataset only, I don't think they have a category for "ships". The closest category they have is "boat". Here's the link to check the available categories: http://cocodataset.org/#overview
BTW, there are ships inside the boat category too.
If you want to just select images of a specific COCO category, you might want to do something like this (taken and edited from COCO's official demos):
# display COCO categories
cats = coco.loadCats(coco.getCatIds())
nms=[cat['name'] for cat in cats]
print('COCO categories: \n{}\n'.format(' '.join(nms)))
# get all images containing given categories (I'm selecting the "bird")
catIds = coco.getCatIds(catNms=['bird']);
imgIds = coco.getImgIds(catIds=catIds);
Nowadays there is a package called fiftyone with which you could download the MS COCO dataset and get the annotations for specific classes only. More information about installation can be found at https://github.com/voxel51/fiftyone#installation.
Once you have the package installed, simply run the following to get say the "person" and "car" classes:
import fiftyone.zoo as foz
# To download the COCO dataset for only the "person" and "car" classes
dataset = foz.load_zoo_dataset(
"coco-2017",
split="train",
label_types=["detections", "segmentations"],
classes=["person", "car"],
# max_samples=50,
)
If desired, you can comment out the last option to set a maximum samples size. Moreover, you can change the "train" split to "validation" in order to obtain the validation split instead.
To visualize the dataset downloaded, simply run the following:
# Visualize the dataset in the FiftyOne App
import fiftyone as fo
session = fo.launch_app(dataset)
If you would like to download the splits "train", "validation", and "test" in the same function call of the data to be loaded, you could do the following:
dataset = foz.load_zoo_dataset(
"coco-2017",
splits=["train", "validation", "test"],
label_types=["detections", "segmentations"],
classes=["person"],
# max_samples=50,
)
I tried the code that #yatu and #Tim had shared here, but I got lots of requests.exceptions.ConnectionError: HTTPSConnectionPool.
So after carefully reading this answer to Max retries exceeded with URL in requests, I rewrote the code like this one and now it runs smoothly:
from pycocotools.coco import COCO
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
import requests
from tqdm.notebook import tqdm
# instantiate COCO specifying the annotations json path
coco = COCO('annotations/instances_train2017.json')
# Specify a list of category names of interest
catIds = coco.getCatIds(catNms=['person'])
# Get the corresponding image ids and images using loadImgs
imgIds = coco.getImgIds(catIds=catIds)
images = coco.loadImgs(imgIds)
# handle annotations
ANNOTATIONS = {"info": {
"description": "my-project-name"
}
}
def cocoJson(images: list) -> dict:
arrayIds = np.array([k["id"] for k in images])
annIds = coco.getAnnIds(imgIds=arrayIds, catIds=catIds, iscrowd=None)
anns = coco.loadAnns(annIds)
for k in anns:
k["category_id"] = catIds.index(k["category_id"])+1
catS = [{'id': int(value), 'name': key}
for key, value in categories.items()]
ANNOTATIONS["images"] = images
ANNOTATIONS["annotations"] = anns
ANNOTATIONS["categories"] = catS
return ANNOTATIONS
def createJson(JsonFile: json, label='train') -> None:
name = label
Path("data/labels").mkdir(parents=True, exist_ok=True)
with open(f"data/labels/{name}.json", "w") as outfile:
json.dump(JsonFile, outfile)
def downloadImages(images: list) -> None:
session = requests.Session()
retry = Retry(connect=3, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
for im in tqdm(images):
if not isfile(f"data/images/{im['file_name']}"):
img_data = session.get(im['coco_url']).content
with open('data/images/' + im['file_name'], 'wb') as handler:
handler.write(img_data)
trainSet = cocoJson(images)
createJson(trainSet)
downloadImages(images)
On my side I had recent difficulties installing fiftyone with Apple Silicon Mac (M1), so I created a script based on pycocotools that allows me to quickly download a subset of the coco 2017 dataset (images and annotations).
It is very simple to use, details are available here: https://github.com/tikitong/minicoco , hope this helps.

Setting pgNumType property in python-docx is without effect

I'm trying to set page numbers in a word document using python-docx. I found an attribute pgNumType (pageNumberType), which I'm setting with this code:
document = Document()
section = document.add_section(WD_SECTION.CONTINUOUS)
sections = document.sections
sectPr = sections[0]._sectPr
pgNumType = OxmlElement('w:pgNumType')
pgNumType.set(qn('w:fmt'), 'decimal')
pgNumType.set(qn('w:start'), '1')
sectPr.append(pgNumType)
This code does nothing, no page numbers are in the output document. I did the same with a lnNumType attribute which is for line numbers and it worked fine. So what is it with the pgNumType attribute? The program executes without error, so the attribute exists. But anyone knows why it has no effect?
While your page style setting is fine, but it does not automatically insert a page number field anywhere in your document. Normally, page numbers appear in a header or a footer, but unfortunately, python-docx does not currently support headers, footers or fields. The former two appear to be an ongoing work in progress: https://github.com/python-openxml/python-docx/issues/104.
The linked issue mentions a number of workarounds. The one I have found to be most robust is to create an otherwise blank document with the headers and footers set up exactly the way you want in MS Word. You can then load and append to that document instead of the default template that docx.Document returns.
This technique is implied to be the suggested method for editing headers in the official documentation:
A lot of how a document looks is determined by the parts that are left when you delete all the content. Things like styles and page headers and footers are contained separately from the main content, allowing you to place a good deal of customization in your starting document that then appears in the document you produce.
#T.Poe
I was working on same problem.
new_section = document.add_section() # Added new section for assigning different footer on each page.
sectPr = new_section._sectPr
pgNumType = OxmlElement('w:pgNumType')
pgNumType.set(qn('w:fmt'), 'decimal')
pgNumType.set(qn('w:start'), '1')
sectPr.append(pgNumType)
new_footer = new_section.footer # Get footer-area of the recent section in document
new_footer.is_linked_to_previous = False
footer_para = new_footer.add_paragraph()
run_footer = footer_para.add_run("Your footer here")
_add_number_range(run_footer)
font = run_footer.font
font.name = 'Arial'
font.size = Pt(8)
footer_para.paragraph_format.page_break_before = True
This worked for me :)
I don't know whats not working for you. I just created a new section
Some code for function I used to define field is as follows:
def _add_field(run, field):
""" add a field to a run
"""
fldChar1 = OxmlElement('w:fldChar') # creates a new element
fldChar1.set(qn('w:fldCharType'), 'begin') # sets attribute on element
instrText = OxmlElement('w:instrText')
instrText.set(qn('xml:space'), 'preserve') # sets attribute on element
instrText.text = field
fldChar2 = OxmlElement('w:fldChar')
fldChar2.set(qn('w:fldCharType'), 'separate')
t = OxmlElement('w:t')
t.text = "Seq"
fldChar2.append(t)
fldChar4 = OxmlElement('w:fldChar')
fldChar4.set(qn('w:fldCharType'), 'end')
r_element = run._r
r_element.append(fldChar1)
r_element.append(instrText)
r_element.append(fldChar2)
r_element.append(fldChar4)
def _add_number_range(run):
""" add a number range field to a run
"""
_add_field(run, r'Page')

how to import csv data into django models

I have some CSV data and I want to import into django models using the example CSV data:
1;"02-01-101101";"Worm Gear HRF 50";"Ratio 1 : 10";"input shaft, output shaft, direction A, color dark green";
2;"02-01-101102";"Worm Gear HRF 50";"Ratio 1 : 20";"input shaft, output shaft, direction A, color dark green";
3;"02-01-101103";"Worm Gear HRF 50";"Ratio 1 : 30";"input shaft, output shaft, direction A, color dark green";
4;"02-01-101104";"Worm Gear HRF 50";"Ratio 1 : 40";"input shaft, output shaft, direction A, color dark green";
5;"02-01-101105";"Worm Gear HRF 50";"Ratio 1 : 50";"input shaft, output shaft, direction A, color dark green";
I have some django models named Product. In Product there are some fields like name, description and price. I want something like this:
product=Product()
product.name = "Worm Gear HRF 70(02-01-101116)"
product.description = "input shaft, output shaft, direction A, color dark green"
product.price = 100
You want to use the csv module that is part of the python language and you should use Django's get_or_create method
with open(path) as f:
reader = csv.reader(f)
for row in reader:
_, created = Teacher.objects.get_or_create(
first_name=row[0],
last_name=row[1],
middle_name=row[2],
)
# creates a tuple of the new object or
# current object and a boolean of if it was created
In my example the model teacher has three attributes first_name, last_name and middle_name.
Django documentation of get_or_create method
If you want to use a library, a quick google search for csv and django reveals two libraries - django-csvimport and django-adaptors. Let's read what they have to say about themselves...
django-adaptors:
Django adaptor is a tool which allow you to transform easily a CSV/XML
file into a python object or a django model instance.
django-importcsv:
django-csvimport is a generic importer tool to allow the upload of CSV
files for populating data.
The first requires you to write a model to match the csv file, while the second is more of a command-line importer, which is a huge difference in the way you work with them, and each is good for a different type of project.
So which one to use? That depends on which of those will be better suited for your project in the long run.
However, you can also avoid a library altogether, by writing your own django script to import your csv file, something along the lines of (warning, pseudo-code ahead):
# open file & create csvreader
import csv, yada yada yada
# import the relevant model
from myproject.models import Foo
#loop:
for line in csv file:
line = parse line to a list
# add some custom validation\parsing for some of the fields
foo = Foo(fieldname1=line[1], fieldname2=line[2] ... etc. )
try:
foo.save()
except:
# if the're a problem anywhere, you wanna know about it
print "there was a problem with line", i
It's super easy. Hell, you can do it interactively through the django shell if it's a one-time import. Just - figure out what you want to do with your project, how many files do you need to handle and then - if you decide to use a library, try figuring out which one better suits your needs.
Use the Pandas library to create a dataframe of the csv data.
Name the fields either by including them in the csv file's first line or in code by using the dataframe's columns method.
Then create a list of model instances.
Finally use the django method .bulk_create() to send your list of model instances to the database table.
The read_csv function in pandas is great for reading csv files and gives you lots of parameters to skip lines, omit fields, etc.
import pandas as pd
from app.models import Product
tmp_data=pd.read_csv('file.csv',sep=';')
#ensure fields are named~ID,Product_ID,Name,Ratio,Description
#concatenate name and Product_id to make a new field a la Dr.Dee's answer
products = [
Product(
name = tmp_data.ix[row]['Name'],
description = tmp_data.ix[row]['Description'],
price = tmp_data.ix[row]['price'],
)
for row in tmp_data['ID']
]
Product.objects.bulk_create(products)
I was using the answer by mmrs151 but saving each row (instance) was very slow and any fields containing the delimiting character (even inside of quotes) were not handled by the open() -- line.split(';') method.
Pandas has so many useful caveats, it is worth getting to know
You can also use, django-adaptors
>>> from adaptor.model import CsvModel
>>> class MyCSvModel(CsvModel):
... name = CharField()
... age = IntegerField()
... length = FloatField()
...
... class Meta:
... delimiter = ";"
You declare a MyCsvModel which will match to a CSV file like this:
Anthony;27;1.75
To import the file or any iterable object, just do:
>>> my_csv_list = MyCsvModel.import_data(data = open("my_csv_file_name.csv"))
>>> first_line = my_csv_list[0]
>>> first_line.age
27
Without an explicit declaration, data and columns are matched in the same order:
Anthony --> Column 0 --> Field 0 --> name
27 --> Column 1 --> Field 1 --> age
1.75 --> Column 2 --> Field 2 --> length
For django 1.8 that im using,
I made a command that you can create objects dynamically in the future,
so you can just put the file path of the csv, the model name and the app name of the relevant django application, and it will populate the relevant model without specified the field names.
so if we take for example the next csv:
field1,field2,field3
value1,value2,value3
value11,value22,value33
it will create the objects
[{field1:value1,field2:value2,field3:value3}, {field1:value11,field2:value22,field3:value33}]
for the model name you will enter to the command.
the command code:
from django.core.management.base import BaseCommand
from django.db.models.loading import get_model
import csv
class Command(BaseCommand):
help = 'Creating model objects according the file path specified'
def add_arguments(self, parser):
parser.add_argument('--path', type=str, help="file path")
parser.add_argument('--model_name', type=str, help="model name")
parser.add_argument('--app_name', type=str, help="django app name that the model is connected to")
def handle(self, *args, **options):
file_path = options['path']
_model = get_model(options['app_name'], options['model_name'])
with open(file_path, 'rb') as csv_file:
reader = csv.reader(csv_file, delimiter=',', quotechar='|')
header = reader.next()
for row in reader:
_object_dict = {key: value for key, value in zip(header, row)}
_model.objects.create(**_object_dict)
note that maybe in later versions
from django.db.models.loading import get_model
is deprecated and need to be change to
from django.apps.apps import get_model
The Python csv library can do your parsing and your code can translate them into Products().
something like this:
f = open('data.txt', 'r')
for line in f:
line = line.split(';')
product = Product()
product.name = line[2] + '(' + line[1] + ')'
product.description = line[4]
product.price = '' #data is missing from file
product.save()
f.close()
Write command in Django app. Where you need to provide a CSV file and loop it and create a model with every new row.
your_app_folder/management/commands/ProcessCsv.py
from django.core.management.base import BaseCommand
from django.conf import settings
from your_app_name.models import Product
class Command(BaseCommand):
def handle(self, *args, **options):
with open(os.join.path(settings.BASE_DIR / 'your_csv_file.csv'), 'r') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=';')
for row in csv_reader:
Product.objects.create(name=row[2], description=row[3], price=row[4])
At the end just run the command to process your CSV file and insert it into Product model.
Terminal:
python manage.py ProcessCsv
Thats it.
If you're working with new versions of Django (>10) and don't want to spend time writing the model definition. you can use the ogrinspect tool.
This will create a code definition for the model .
python manage.py ogrinspect [/path/to/thecsv] Product
The output will be the class (model) definition. In this case the model will be called Product.
You need to copy this code into your models.py file.
Afterwards you need to migrate (in the shell) the new Product table with:
python manage.py makemigrations
python manage.py migrate
More information here:
https://docs.djangoproject.com/en/1.11/ref/contrib/gis/tutorial/
Do note that the example has been done for ESRI Shapefiles but it works pretty good with standard CSV files as well.
For ingesting your data (in CSV format) you can use pandas.
import pandas as pd
your_dataframe = pd.read_csv(path_to_csv)
# Make a row iterator (this will go row by row)
iter_data = your_dataframe.iterrows()
Now, every row needs to be transformed into a dictionary and use this dict for instantiating your model (in this case, Product())
# python 2.x
map(lambda (i,data) : Product.objects.create(**dict(data)),iter_data
Done, check your database now.
You can use the django-csv-importer package.
http://pypi.python.org/pypi/django-csv-importer/0.1.1
It works like a django model
MyCsvModel(CsvModel):
field1 = IntegerField()
field2 = CharField()
etc
class Meta:
delimiter = ";"
dbModel = Product
And you just have to:
CsvModel.import_from_file("my file")
That will automatically create your products.
You can give a try to django-import-export. It has nice admin integration, changes preview, can create, update, delete objects.
This is based off of Erik's answer from earlier, but I've found it easiest to read in the .csv file using pandas and then create a new instance of the class for every row in the in data frame.
This example is updated using iloc as pandas no longer uses ix in the most recent version. I don't know about Erik's situation but you need to create the list outside of the for loop otherwise it will not append to your array but simply overwrite it.
import pandas as pd
df = pd.read_csv('path_to_file', sep='delimiter')
products = []
for i in range(len(df)):
products.append(
Product(
name=df.iloc[i][0]
description=df.iloc[i][1]
price=df.iloc[i][2]
)
)
Product.objects.bulk_create(products)
This is just breaking the DataFrame into an array of rows and then selecting each column out of that array off the zero index. (i.e. name is the first column, description the second, etc.)
Hope that helps.
Here's a django egg for it:
django-csvimport
Consider using Django's built-in deserializers. Django's docs are well-written and can help you get started. Consider converting your data from csv to XML or JSON and using a deserializer to import the data. If you're doing this from the command line (rather than through a web request), the loaddata manage.py command will be especially helpful.
define class in models.py and a function in it.
class all_products(models.Model):
def get_all_products():
items = []
with open('EXACT FILE PATH OF YOUR CSV FILE','r') as fp:
# You can also put the relative path of csv file
# with respect to the manage.py file
reader1 = csv.reader(fp, delimiter=';')
for value in reader1:
items.append(value)
return items
You can access ith element in the list as items[i]

Categories

Resources