I have generated my first table with the python pywin32 package. I would like to add another table after the first one. Can anyone help me on that?
create the first table with 6 rows and 4 columns:
from win32com.client import Dispatch,constants
mw = Dispatch('Word.Application')
mw.Visible = 1
md = mw.Documents.Add(Template = MYDIR + '\\Template for tests.docx')
rng = md.Range(0,0)
tabletu = md.Tables.Add(rng,6,4)
To create the next table what should be the rng? How could I set my Range object? Any tutorial on that?
Also how could I close and save it properly? I used:
filename = "CPM Production FAT Procedures.docx"
md.SaveAs(filename)
But each time it increases the document number.
Thanks,
win32com is just a wrapper for Microsoft's COM API. All the functions and properties that you are calling are part of the COM API for Word. You will find that API extensively documented here:
Word 2013 developer reference
You might find the article Working with Range Objects particularly instructive in this case.
All of the examples will be in VB, but it's fairly trivial to read across to Python/win32com.
For your particular problem something like the following should work:
rng = md.Range(md.Content.End-1, md.Content.End)
md.Paragraphs.Add(rng)
rng = md.Range(md.Content.End-1, md.Content.End)
another_table = md.Tables.Add(rng,6,4)
As for your saving issue, I can't reproduce the problem. If I repeatedly save with the same filename, I see the same file being over-written.
Related
I am trying to connect knack online database with my python data handling scripts in order to renew objects/tables directly into my knack app builder. I discovered pyknackhq Python API for KnackHQ can fetch objects and return json objects for the object's records. So far so good.
However, following the documentation (http://www.wbh-doc.com.s3.amazonaws.com/pyknackhq/quick%20start.html) I have tried to fetch all rows (records in knack) for my object-table (having in total 344 records).
My code was:
i =0
for rec in undec_obj.find():
print(rec)
i=i+1
print(i)
>> 25
All first 25 records were returned indeed, however the rest until the 344-th were never returned. The documentation of pyknackhq library is relatively small so I couldn't find a way around my problem there. Is there a solution to get all my records/rows? (I have also changed the specification in knack to have all my records appear in the same page - page 1).
The ultimate goal is to take all records and make them a pandas dataframe.
thank you!
I haven't worked with that library, but I've written another python Knack API wrapper that should help:
https://github.com/cityofaustin/knackpy
The docs should get you where you want to go. Here's an example:
>>> from knackpy import Knack
# download data from knack object
# will fetch records in chunks of 1000 until all records have been downloaded
# optionally pass a rows_per_page and/or page_limit parameter to limit record count
>>> kn = Knack(
obj='object_3',
app_id='someappid',
api_key='topsecretapikey',
page_limit=10, # not needed; this is the default
rows_per_page=1000 # not needed; this is the default
)
>>> for row in kn.data:
print(row)
{'store_id': 30424, 'inspection_date': 1479448800000, 'id': '58598262bcb3437b51194040'},...
Hope that helps. Open a GitHub issue if you have any questions using the package.
I am newbie to the Smartsheet Python SDK. Using the sample code from the Smartsheets API doc as a starting point:
action = smartsheet.Sheets.list_sheets(include_all=True)
sheets = action.data
This code returns a response just fine.
I am now looking for some simple examples to iterate over the sheets ie:
for sheet in sheets:
then select a sheet by name
then iterate over the rows in the selected sheet and select a row.
for row in rows:
then retrieve a cell value from the selected row in the selected sheet.
I just need some simple samples to get started. I have searched far and wide and unable to find any simple examples of how to do this
Thanks!
As Scott said, a sheet could return a lot of data, so make sure that you use filters judiciously. Here is an example of some code I wrote to pull two rows but only one column in each row:
action = smartsheet.Sheets.get_sheet(SHEET_ID, column_ids=COL_ID, row_numbers="2,4")
Details on the available filters can be found here.
UPDATE: more code added in order to follow site etiquette and provide a complete answer.
The first thing I did while learning the API is display a list of all my sheets and their corresponding sheetId.
action = MySS.Sheets.list_sheets(include_all=True)
for single_sheet in action.data:
print single_sheet.id, single_sheet.name
From that list I determined the sheetId for the sheet I want to pull data from. In my example, I actually needed to pull the primary column, so I used this code to determine the Id of the primary column (and also saved the non-primary column Ids in a list because at the time I thought I might need them):
PrimaryCol = 0
NonPrimaryCol = []
MyColumns = MySS.Sheets.get_columns(SHEET_ID)
for MyCol in MyColumns.data:
if MyCol.primary:
print "Found primary column", MyCol.id
PrimaryCol = MyCol.id
else:
NonPrimaryCol.append(MyCol.id)
Lastly, keeping in mind that retrieving an entire sheet could return a lot of data, I used a filter to return only the data in the primary column:
MySheet = MySS.Sheets.get_sheet(SHEET_ID, column_ids=PrimaryCol)
for MyRow in MySheet.rows:
for MyCell in MyRow.cells:
print MyRow.id, MyCell.value
Below is a very simple example. Most of this is standard python, but one somewhat non-intuitive thing about this may be the fact that the sheet objects in the list returned from smartsheet.Sheets.list_sheets doesn't include the rows & cells. As this could be a lot of data, it returns information about the sheet, that you can use to retrieve the sheet's complete data by calling smartsheet.Sheets.get_sheet.
To better understand things such as this, be sure to keep the Smartsheet REST API reference handy. Since the SDK is really just calling this API under the covers, you can often find more information by look at that documentation as well.
action = smartsheet.Sheets.list_sheets(include_all=True)
sheets = action.data
for sheetInfo in sheets:
if sheetInfo.name=='WIP':
sheet = smartsheet.Sheets.get_sheet(sheetInfo.id)
for row in sheet.rows:
if row.row_number==2:
for c in range(0, len(sheet.columns)):
print row.cells[c].value
I started working with Python APIs with SmartSheets. Due to our usage of smartsheets to back some of our RIO2016 Olympic Games operations, every now and then we had to delete the oldest Smartsheets for the sake of licence compliance limits. And that was a blunder: login, select each smarts among 300 hundred, check every field and so on. So thanks smartsheet API 2.0, we could learn easily how many sheets we have been used so far, get all the 'modified' date, sort by that column from the latest to the most recent date and then write to a CSV disk. I am not sure if this is the best approach for that but it worked as I expected.I use Idle-python2.7, Debian 8.5. Here you are:
# -*- coding: utf-8 -*-
#!/usr/bin/python
'''
create instance of Sheet Object.
Then populate List of Sheet Object with name and modified
A token is necessary to access Smartsheets
We create and return a list of all objects with fields aforesaid.
'''
# The Library
import smartsheet, csv
'''
Token long var. This token can be obtained in
Account->Settings->Apps...->API
from a valid SmartSheet Account.
'''
xMytoken=xxxxxxxxxxxxxxxxxxxxxx
# Smartsheet Token
xSheet = smartsheet.Smartsheet(xMyToken)
# Class object
xResult = xSheet.Sheets.list_sheets(include_all=True)
# The list
xList = []
'''
for each sheet element, we choose two, namely name and date of modification. As most of our vocabulary has special characters, we use utf-8 after the name of each spreadsheet.So for each sheet read from Object sheets
'''
for sheet1 in xResult.data.
xList.append((sheet1._name.encode('utf-8'),sheet1._modified_at))
# sort the list created by 'Modifiedat' attribute
xNlist = sorted(xList,key=lambda x: x[1])
# print list
for key, value in xNlist:
print key,value
# Finally write to disk
with open("listofsmartsh.csv", "wb") as f:
writer = csv.writer(f)
writer.writerows(xNList)
Hope you enjoy.
regards
I've crawled a tracklist of 36.000 songs, which have been played on the Danish national radio station P3. I want to do some statistics on how frequently each of the genres have been played within this period, so I figured the discogs API might help labeling each track with genre. However, the documentation for the API doesent seem to include an example for querying the genre of a particular song.
I have a CSV-file with with 3 columns: Artist, Title & Test(Test where i want the API to label each song with the genre).
Here's a sample of the script i've built so far:
import json
import pandas as pd
import requests
import discogs_client
d = discogs_client.Client('ExampleApplication/0.1')
d.set_consumer_key('key-here', 'secret-here')
input = pd.read_csv('Desktop/TEST.csv', encoding='utf-8',error_bad_lines=False)
df = input[['Artist', 'Title', 'Test']]
df.columns = ['Artist', 'Title','Test']
for i in range(0, len(list(df.Artist))):
x = df.Artist[i]
g = d.artist(x)
df.Test[i] = str(g)
df.to_csv('Desktop/TEST2.csv', encoding='utf-8', index=False)
This script has been working with a dummy file with 3 records in it so far, for mapping the artist of a given ID#. But as soon as the file gets larger(ex. 2000), it returns a HTTPerror when it cannot find the artist.
I have some questions regarding this approach:
1) Would you recommend using the search query function in the API for retrieving a variable as 'Genre'. Or do you think it is possible to retrieve Genre with a 'd.' function from the API?
2) Will I need to aquire an API-key? I have succesfully mapped the 3 records without an API-key so far. Looks like the key is free though.
Here's the guide I have been following:
https://github.com/discogs/discogs_client
And here's the documentation for the API:
https://www.discogs.com/developers/#page:home,header:home-quickstart
Maybe you need to re-read the discogs_client examples, i am not an expert myself, but a newbie trying to use this API.
AFAIK, g = d.artist(x) fails because x must be a integer not a string.
So you must first do a search, then get the artist id, then d.artist(artist_id)
Sorry for no providing an example, i am python newbie right now ;)
Also have you checked acoustid for
It's a probably a rate limit.
Read the status code of your response, you should find an 429 Too Many Requests
Unfortunately, if that's the case, the only solution is to add a sleep in your code to make one request per second.
Checkout the api doc:
http://www.discogs.com/developers/#page:home,header:home-rate-limiting
I found this guide:
https://github.com/neutralino1/discogs_client.
Access the api with your key and try something like:
d = discogs_client.Client('something.py', user_token=auth_token)
release = d.release(774004)
genre = release.genres
If you found a better solution please share.
I havent used Python before but, am told this would be a good language to use for this process. I have a list of Lat/ Long coordinates that i need to convert into a business name. Any ideas where i might find documentation on how to complete this process?
Example:
Doc: LatLong.txt (Has a list of lat / long seperated by columns)
I need to run that list against the Places API with a max radius of 30 and return any businesses (BusinessName, Addy, Phone, etc.) within that radius.
from googleplaces import googleplaces
YOUR_API_KEY = 'aa262fad30e663fca7c7a2be9354fe9984f0b5f2'
google_places = googleplaces(YOUR_API_KEY)
query_result = google_places.nearby_search(lat_lng=41.802893, -89.649930,radius=20000)
for place in query_result.places:
print (place.name)
print (place.geo_location)
print (place.reference)
place.get_details()
print (place.local_phone_number)
print (place.international_phone_number)
print (place.webite)
print (place.url)
is what im playing around with..
Google has some pretty good documentation for getting started:
https://developers.google.com/places/training/basic-place-search
Also this answer may help: https://stackoverflow.com/a/21924452/1393496
You should edit your question and include more detail, what have you tried? what worked? what didn't work?
I am looking for a way to extract / scrape data from Word files into a database. Our corporate procedures have Minutes of Meetings with clients documented in MS Word files, mostly due to history and inertia.
I want to be able to pull the action items from these meeting minutes into a database so that we can access them from a web-interface, turn them into tasks and update them as they are completed.
Which is the best way to do this:
VBA macro from inside Word to create CSV and then upload to the DB?
VBA macro in Word with connection to DB (how does one connect to MySQL from VBA?)
Python script via win32com then upload to DB?
The last one is attractive to me as the web-interface is being built with Django, but I've never used win32com or tried scripting Word from python.
EDIT: I've started extracting the text with VBA because it makes it a little easier to deal with the Word Object Model. I am having a problem though - all the text is in Tables, and when I pull the strings out of the CELLS I want, I get a strange little box character at the end of each string. My code looks like:
sFile = "D:\temp\output.txt"
fnum = FreeFile
Open sFile For Output As #fnum
num_rows = Application.ActiveDocument.Tables(2).Rows.Count
For n = 1 To num_rows
Descr = Application.ActiveDocument.Tables(2).Cell(n, 2).Range.Text
Assign = Application.ActiveDocument.Tables(2).Cell(n, 3).Range.Text
Target = Application.ActiveDocument.Tables(2).Cell(n, 4).Range.Text
If Target = "" Then
ExportText = ""
Else
ExportText = Descr & Chr(44) & Assign & Chr(44) & _
Target & Chr(13) & Chr(10)
Print #fnum, ExportText
End If
Next n
Close #fnum
What's up with the little control character box? Is some kind of character code coming across from Word?
Word has a little marker thingy that it puts at the end of every cell of text in a table.
It is used just like an end-of-paragraph marker in paragraphs: to store the formatting for the entire paragraph.
Just use the Left() function to strip it out, i.e.
Left(Target, Len(Target)-1))
By the way, instead of
num_rows = Application.ActiveDocument.Tables(2).Rows.Count
For n = 1 To num_rows
Descr = Application.ActiveDocument.Tables(2).Cell(n, 2).Range.Text
Try this:
For Each row in Application.ActiveDocument.Tables(2).Rows
Descr = row.Cells(2).Range.Text
Well, I've never scripted Word, but it's pretty easy to do simple stuff with win32com. Something like:
from win32com.client import Dispatch
word = Dispatch('Word.Application')
doc = word.Open('d:\\stuff\\myfile.doc')
doc.SaveAs(FileName='d:\\stuff\\text\\myfile.txt', FileFormat=?) # not sure what to use for ?
This is untested, but I think something like that will just open the file and save it as plain text (provided you can find the right fileformat) – you could then read the text into python and manipulate it from there. There is probably a way to grab the contents of the file directly, too, but I don't know it off hand; documentation can be hard to find, but if you've got VBA docs or experience, you should be able to carry them across.
Have a look at this post from a while ago: http://mail.python.org/pipermail/python-list/2002-October/168785.html Scroll down to COMTools.py; there's some good examples there.
You can also run makepy.py (part of the pythonwin distribution) to generate python "signatures" for the COM functions available, and then look through it as a kind of documentation.
You could use OpenOffice. It can open word files, and also can run python macros.
I'd say look at the related questions on the right -->
The top one seems to have some good ideas for going the python route.
how about saving the file as xml. then using python or something else and pull the data out of word and into the database.
It is possible to programmatically save a Word document as HTML and to import the table(s) contained into Access. This requires very little effort.