Uploading a file via paperclip through an external script - python

I'm trying to create a rails app that is a CMS for a client. The app currently has a documents class that uploads the document with paperclip.
Separate to this, we're running a python script that accesses the database and gets a bunch of information for a given event, creates a proposal word document, and uploads it to the database under the correct event.
This all works, but the app does not recognize the document. How do I make a python script that will correctly upload the document such that paperclip knows what's going on?
Here is my paperclip controller:
def new
#event = Event.find(params[:event_id])
#document = Document.new
end
def create
#event = Event.find(params[:event_id])
#document = #event.documents.new(document_params)
if #document.save
redirect_to event_path(#event)
end
end
private
def document_params
params.require(:document).permit(:event_id, :data, :title)
end
Model
validates :title, presence: true
has_attached_file :data
validates_attachment_content_type :data, :content_type => ["application/pdf", "application/msword"]
Here is the python code.
f = open(propStr, 'r')
binary = psycopg2.Binary(f.read())
self.cur.execute("INSERT INTO documents (event_id, title, data_file_name, data_content_type) VALUES (%d,'Proposal.doc',%s,'application/msword');" % (self.eventData[0], binary))
self.con.commit()

You should probably use Ruby to script this since it can load in any model information or other classes you need.
But assuming your requirements dictate the use of python, be aware that Paperclip does not store the documents in your database tables, only the files' metadata. The actual file is stored in your file system in the /public dir by default (could also be s3, etc depending on your configuration). I would make sure you were actually saving the file to the correct anticipated directory. The default path according to the docs is:
:rails_root/public/system/:class/:attachment/:id_partition/:style/:filename
so you will have to make another sql query to retrieve the id of your new record. I don't believe pdfs have a :style attribute since you don't use imagicmagick to resize them, so build a path that looks something like this:
/public/system/documents/data/000/000/123/my_file.pdf
and save it from your python script.

Related

Python script uses json database; is it possible to have multiple instances of the script writing to the file?

The script has a database.json file that it reads and writes to with every operation it does. It's a simple database, 'two columns' one with text (includes hyphens, numbers, and underscores) and an associated number. There's about 40,000 rows though.
The script as of right now can only work in the database at one instance. I know this because I had two instances running and when I checked the database half of it was missing, so it would seem the script would erase the data from the other script. Long story short I had to rebuild it from a backup, but I have a question:
Can a Python script access a json database with read and write in multiple instances? or must the script be written with some more complex database functionality?
Here are some snippets of the code that interacts with the database:
def load_database():
# If file exists load the json as a dict
if os.path.isfile('database.json'):
return json.load(open('database.json', 'r'))
# If file doesn't exist return empty dict because the script hasn't run yet
else:
return {}
------
def save_database():
json.dump(database, open('database.json', 'w'))
------
self.user_data = self.get_user_data()
if self.user_data:
self.user_id = str(self.user_data['id'])
if self.user_id in database:
if database[self.user_id] != self.username:
main_log.info('User {} changed their username to {}.'.format(database[self.user_id], self.username))
os.rename('Users/{}_{}'.format(database[self.user_id], self.user_id), 'Users/{}_{}'.format(self.username, self.user_id))
add_username_to_history('Users/{}_{}/'.format(self.username, self.user_id), database[self.user_id])
database[self.user_id] = self.username
else:
safe_mkdir('Users/{}_{}'.format(self.username, self.user_id))
else:
safe_mkdir('Users/{}_{}'.format(self.username, self.user_id))
main_log.info('Added {} with ID:{} to the database.'.format(self.username, self.user_id))
database.update({self.user_id: self.username})
# Create the user log obj
self.user_log = Logger('Users/{}_{}/log.log'.format(self.username, self.user_id))
# Create ID File
create_id_file('Users/{}_{}/'.format(self.username, self.user_id), self.user_id)
save_database()
There's a few more instances where the database is called, but those give an idea.
The reason I wanted to run multiple instances was to double up the speed on scraping. I already have ThreadPool and even with 200+ threads it's not as fast as it could be by having multiple instances.
What you need is the concept of atomicity.
Since your json file is not too big, one not-ideal-but-workable option could be :
When needed to read: open file, read values into variable(s), close connection i.e. do not keep the json open
Repeat the step 1 everytime you need to read the database
When needed to write: open file, write values into json, close connection i.e. do not keep the json open
Repeat step 2 everytime you need to run any DML
I used a Redis database. This allowed the script to individually write to the database thus allowing multiple instances of the script to be running.

Can LibreOffice/OpenOffice programmatically add passwords to existing .docx/.xlsx/.pptx files?

TL;DR version - I need to programmatically add a password to .docx/.xlsx/.pptx files using LibreOffice and it doesn't work, and no errors are reported back either, my request to add a password is simply ignored, and a password-less version of the same file is saved.
In-depth:
I'm trying to script the ability to password-protect existing .docx/.xlsx/.pptx files using LibreOffice.
I'm using 64-bit LibreOffice 6.2.5.2 which is the latest version at the time of writing, on Windows 8.1 64-bit Professional.
Whilst I can do this manually via the UI - specifically, I open the "plain" document, do "Save As" and then tick "Save with Password", and enter the password in there, I cannot get this to work via any kind of automation. I'm been trying via Python/Uno, but to no gain. Although the code below correctly opens and saves the document, my attempt to add a password is completely ignored. Curiously, the file size shrinks from 12kb to 9kb when I do this.
Here is my code:
import socket
import uno
import sys
localContext = uno.getComponentContext()
resolver = localContext.ServiceManager.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", localContext)
ctx = resolver.resolve( "uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext" )
smgr = ctx.ServiceManager
desktop = smgr.createInstanceWithContext( "com.sun.star.frame.Desktop",ctx)
from com.sun.star.beans import PropertyValue
properties=[]
oDocB = desktop.loadComponentFromURL ("file:///C:/Docs/PlainDoc.docx","_blank",0, tuple(properties) )
sp=[]
sp1=PropertyValue()
sp1.Name='FilterName'
sp1.Value='MS Word 2007 XML'
sp.append(sp1)
sp2=PropertyValue()
sp2.Name='Password'
sp2.Value='secret'
sp.append(sp2)
oDocB.storeToURL("file:///C:/Docs/PasswordDoc.docx",sp)
oDocB.dispose()
I've had great results using Python/Uno to open password-protected files, but I cannot get it to protect a previously unprotected document. I've tried enabling the macro recorder and recording my actions - it recorded the following LibreOffice BASIC code:
sub SaveDoc
rem ----------------------------------------------------------------------
rem define variables
dim document as object
dim dispatcher as object
rem ----------------------------------------------------------------------
rem get access to the document
document = ThisComponent.CurrentController.Frame
dispatcher = createUnoService("com.sun.star.frame.DispatchHelper")
rem ----------------------------------------------------------------------
dim args1(2) as new com.sun.star.beans.PropertyValue
args1(0).Name = "URL"
args1(0).Value = "file:///C:/Docs/PasswordDoc.docx"
args1(1).Name = "FilterName"
args1(1).Value = "MS Word 2007 XML"
args1(2).Name = "EncryptionData"
args1(2).Value = Array(Array("OOXPassword","secret"))
dispatcher.executeDispatch(document, ".uno:SaveAs", "", 0, args1())
end sub
Even when I try to run that, it...saves an unprotected document, with no password encryption. I've even tried converting the macro above into the equivalent Python code, but to no avail either. I don't get any errors, it simply doesn't protect the document.
Finally, out of desperation, I've even tried other approaches that don't include LibreOffice, for example, using the Apache POI library as per the following existing StackOverflow question:
Python or LibreOffice Save xlsx file encrypted with password
...but I just get an error saying "Error: Could not find or load main class org.python.util.jython". I've tried upgrading my JDK, tweaking the paths used in the example, i.e. had an "intelligent" go, but still no joy. I suspect the error above is trivial to fix, but I'm not a Java developer and lack the experience in this area.
Does anyone have any solution? Do you have some LibreOffice code that can do this (password-protect .docx/.xlsx/.pptx files)? Or OpenOffice for that matter, I'm not precious about which package I use. Or something else entirely!
NOTE: I appreciate this is trivial using full-fat Microsoft Office, but thanks to Microsoft's licensing restrictions, is a complete no-go for this project - I have to use an alternative.
The following example is from page 40 (file page 56) of Useful Macro Information
For OpenOffice.org by Andrew Pitonyak (http://www.pitonyak.org/AndrewMacro.odt). The document is directed to OpenOffice.org Basic but is generally applicable to LibreOffice as well. The example differs from the macro recorder version primarily in its use of the documented API rather than dispatch calls.
5.8.3. Save a document with a password
To save a document with a password, you must set the “Password”
attribute.
Listing 5.19: Save a document using a password.
Sub SaveDocumentWithPassword
Dim args(0) As New com.sun.star.beans.PropertyValue
Dim sURL$
args(0).Name ="Password"
args(0).Value = "test"
sURL=ConvertToURL("/andrew0/home/andy/test.odt")
ThisComponent.storeToURL(sURL, args())
End Sub
The argument name is case sensitive, so “password” will not work.

How to update/add data to Django application without deleting data

Right now, I have a Django application with an import feature which accepts a .zip file, reads out the csv files and formats them to JSON and then inserts them into the database. The JSON file with all the data is put into temp_dir and is called data.json.
Unfortunatly, the insertion is done like so:
Building.objects.all().delete()
call_command('loaddata', os.path.join(temp_dir, 'data.json'))
My problem is that all the data is deleted then re-added. I need to instead find a way to update and add data and not delete the data.
I've been looking at other Django commands but I can't seem to find out that would allow me to insert the data and update/add records. I'm hoping that there is a easy way to do this without modifying a whole lot.
If you loop through your data you could use get_or_create(), this will return the object if it exist and create it if it doesn't:
obj, created = Person.objects.get_or_create(first_name='John', last_name='Lennon', defaults={'birthday': date(1940, 10, 9)})

Python: sharing sqlite cursor with two classes only, plus the database

I have a script built using procedural programming that uses a sqlite database file. The script processes a CSV file, then uses a standard cursor to pass its particulars to a single SQLite DB.
After this, the script extracts from the DB to produce a number of spreadsheets in Excel via xlwt.
The problem with this is that the script only handles one input file at a time, whereas I will need to be iterating through about 70-90 of these files on any given day.
I've been trying to rewrite the script as object-oriented, but I'm having trouble with sharing the cursor.
The original input file comes in a zip archive that I would extract via Linux or Mac OS X command lines. Previously this was done manually; now I've managed to write classes and loop through totally ad-hoc numbers of multiple input files via the multi version of tkfiledialog.
Furthermore the original input file (ie one of the 70-90) is a text file in csv format with a DRF extension (obv., not really important) that gets picked by a simple tkfiledialog box:
FILENAMER = tkFileDialog.askopenfilename(title="Open file", \
filetypes=[("txt file",".DRF"),("txt file", ".txt"),\
("All files",".*")])
The DRF file itself is issued daily by location and date, ie 'BHP0123.DRF' is for the BHP location, issued for 23 January. To keep everything as straightforward as possible the procedural script further decomposes the DRF to just the BHP0123 part or prefix, then uses it to build a SQLite DB.
FBASENAME = os.path.basename(FILENAMER)
FBROOT = os.path.splitext(FBNAMED)[0]
OUTPUTDATABASE = 'sqlite_' + FBROOT + '.db'
Basically with the program as a procedural script I just had to create one DB, one connection and one cursor, which could be shared by all the functions in the script:
conn = sqlite3.connect(OUTPUTDATABASE) # <-- originally :: Is this the core problem?
curs = conn.cursor()
conn.text_factory = sqlite3.OptimizedUnicode
In the procedural version these variables above are global.
Procedurally I have
1) one function to handle formatting, and
2) another to handle the calculations needed. The DRF is indexed with about 2500 fields per row; I discard the majority and only use about 400-500 of these per row.
The formatting function parses out the CSV via a for-loop (discards junk characters, incomplete data, etc), then passes the formatted data for the calculator to process and chew on. The core problem seems to be that on the one hand I need the DB connection to be constant for each input DRF file, but on the other that connection can only be shared by the formatter and calculator, and 'regenerated' for each DRF.
Crucially, I've tried to rewrite as little of the formatter and calculator as possible.
I've tried to create a separate dbcxn class, then create an instance to share, but I'm confused as to how to handle the output DB situation with the cursor (and pass it intact to both formatter and calculator):
class DBcxn(object):
def __init__(self, OUTPUTDATABASE):
OUTPUTDATABASE = ?????
self.OUTPUTDATABASE = OUTPUTDATABASE
def init_db_cxn(self, OUTPUTDATABASE):
conn = sqlite3.connect(OUTPUTDATABASE) # < ????
self.conn = conn
curs = conn.cursor()
self.curs = curs
conn.text_factory = sqlite3.OptimizedUnicode
dbtest = DBcxn( ???? )
If anyone might suggest a way of untangling this I'd be very grateful. Please let me know if you need more information.
Cheers
Massimo Savino

storing full text from txt file into mongodb

I have created a python script that automates a workflow converting PDF to txt files. I want to be able to store and query these files in MongoDB. Do I need to turn the .txt file into JSON/BSON? Should I be using a program like PyMongo?
I am just not sure what the steps of such a project would be let alone the tools that would help with this.
I've looked at this post: How can one add text files in Mongodb?, which makes me think I need to convert the file to a JSON file, and possibly integrate GridFS?
You don't need to JSON/BSON encode it if you're using a driver. If you're using the MongoDB shell, you'd need to worry about it when you pasted the contents.
You'd likely want to use the Python MongoDB driver:
from pymongo import MongoClient
client = MongoClient()
db = client.test_database # use a database called "test_database"
collection = db.files # and inside that DB, a collection called "files"
f = open('test_file_name.txt') # open a file
text = f.read() # read the entire contents, should be UTF-8 text
# build a document to be inserted
text_file_doc = {"file_name": "test_file_name.txt", "contents" : text }
# insert the contents into the "file" collection
collection.insert(text_file_doc)
(Untested code)
If you made sure that the file names are unique, you could set the _id property of the document and retrieve it like:
text_file_doc = collection.find_one({"_id": "test_file_name.txt"})
Or, you could ensure the file_name property as shown above is indexed and do:
text_file_doc = collection.find_one({"file_name": "test_file_name.txt"})
Your other option is to use GridFS, although it's often not recommended for small files.
There's a starter here for Python and GridFS.
Yes, you must convert your file to JSON. There is a trivial way to do that: use something like {"text": "your text"}. It's easy to extend / update such records later.
Of course you'd need to escape the " occurences in your text. I suppose that you use a JSON library and/or MongoDB library of your favorite language to do all the formatting.

Categories

Resources