I'm hoping my problem can be solved with some geojson expertise. The problem I'm having has to do with RhinoPython - the embedded IronPython engine in McNeel's Rhino 5 (more info here: http://python.rhino3d.com/). I don't think its necessary to be an expert on RhinoPython to answer this question.
I'm trying to load a geojson file in RhinoPython. Because you can't import the geojson module into RhinoPython like in Python I'm using this custom module GeoJson2Rhino provided here: https://github.com/localcode/rhinopythonscripts/blob/master/GeoJson2Rhino.py
Right now my script looks like this:
`import rhinoscriptsyntax as rs
import sys
rp_scripts = "rhinopythonscripts"
sys.path.append(rp_scripts)
import rhinopythonscripts
import GeoJson2Rhino as geojson
layer_1 = rs.GetLayer(layer='Layer 01')
layer_color = rs.LayerColor(layer_1)
f = open('test_3.geojson')
gj_data = geojson.load(f,layer_1,layer_color)
f.close()`
In particular:
f = open('test_3.geojson')
gj_data = geojson.load(f)
works fine when I'm trying to extract geojson data from regular python 2.7. However in RhinoPython I'm getting the following error message: Message: expected string for parameter 'text' but got 'file'; in reference to gj_data = geojson.load(f).
I've been looking at the GeoJson2Rhino script linked above and I think I've set the parameters for the function correctly. As far as I can tell it doesn't seem to recognize my geojson file, and wants it as a string. Is there an alternative file open function I can use to get the function to recognize it as a geojson file?
Judging by the error message, it looks like the load method requires a string as the first input but in the above example a file object is being passed instead. Try this...
f = open('test_3.geojson')
g = f.read(); # read contents of 'f' into a string
gj_data = geojson.load(g)
...or, if you don't actually need the file object...
g = open('test_3.geojson').read() # get the contents of the geojson file directly
gj_data = geojson.load(g)
See here for more information about reading files in python.
Related
So, as the question suggests, I am looking for a way to convert epub2 to epub3 using Python.
What I found so far is how to convert PDF documents in EPUB2 using ASPOSE.PDF for Java (in the python wrapper) and using ASPOSE Word Cloud. The next step would be to manually convert it to epub3 using Sigil, Calibre, or epub3-itzer. However, I would like a python script that could do it automatically similarly to the following:
import asposewordscloud
from asposewordscloud import WordsApi
from asposewordscloud.models.requests import ConvertDocumentRequest
app_sid = 'my_app_id'
app_secret = 'my_secret'
words_api = WordsApi(app_sid, app_secret)
with open('rainer_docs.pdf', 'rb') as f:
request = ConvertDocumentRequest(f, format='epub')
result = words_api.convert_document(request)
print('Output filename : {}'.format(result))
I know this is a bit of hacky method, but it worked for me. I would like something similar to this but simply to convert from epub2 to epub3 using python
Sorry if the question is not well formulated, will reformulated if necessary.
I have a file with an array that I filled with data from an online json db, I imported this array to another file to use its data.
#file1
response = urlopen(url1)
a=[]
data = json.loads(response.read())
for i in range(len(data)):
a.append(data[i]['name'])
i+=1
#file2
from file1 import a
'''do something with "a"'''
Does importing the array means I'm filling the array each time I call it in file2?
If that is the case, what can I do to just keep the data from the array without "building" it each time I call it?
If you saved a to a file, then read a -- you will not need to rebuild a -- you can just open it. For example, here's one way to open a text file and get the text from the file:
# set a variable to be the open file
OpenFile = open(file_path, "r")
# set a variable to be everything read from the file, then you can act on that variable
file_guts = OpenFile.read()
From the Python docs on the Modules section - link - you can read:
When you run a Python module with
python fibo.py <arguments>
the code in the module will be executed, just as if you imported it
This means that importing a module has the same behavior as running it as a regular Python script, unless you use the __name__ as mentioned right after this quotation.
Also, if you think about it, you are opening something, reading from it, and then doing some operations. How can you be sure that the content you are now reading from is the same as the one you had read the first time?
I collected some tweets from the twitter API and stored it to mongodb, I tried exporting the data to a JSON file and didn't have any issues there, until I tried to make a python script to read the JSON and convert it to a csv. I get this traceback error with my code:
json.decoder.JSONDecodeError: Extra data: line 367 column 1 (char 9745)
So, after digging around the internet I was pointed to check the actual JSON data in an online validator, which I did. This gave me the error of:
Multiple JSON root elements
from the site https://jsonformatter.curiousconcept.com/
Here are pictures of the 1st/2nd object beginning/end of the file:
or a link to the data here
Now, the problem is, I haven't found anything on the internet of how to handle that error. I'm not sure if it's an error with the data I've collected, exported, or if I just don't know how to work with it.
My end game with these tweets is to make a network graph. I was looking at either Networkx or Gephi, which is why I'd like to get a csv file.
Robert Moskal is right. If you can address the issue at source and use --jsonArray flag when you use mongoexport then it will make the problem easier i guess. If you can't address it at source then read the below points.
The code below will extract you the individual json objects from the given file and convert them to python dictionaries.
You can then apply your CSV logic to each individual dictionary.
If you are using csv module then I would say use unicodecsv module as it would handle the unicode data in your json objects.
import json
with open('path_to_your_json_file', 'rb') as infile:
json_block = []
for line in infile:
json_block.append(line)
if line.startswith('}'):
json_dict = json.loads(''.join(json_block))
json_block = []
print json_dict
If you want to convert it to CSV using pandas you can use the below code:
import json, pandas as pd
with open('path_to_your_json_file', 'rb') as infile:
json_block = []
dictlist=[]
for line in infile:
json_block.append(line)
if line.startswith('}'):
json_dict = json.loads(''.join(json_block))
dictlist.append(json_dict)
json_block = []
df = pd.DataFrame(jsonlist)
df.to_csv('out.csv',encoding='utf-8')
If you want to flatten out the json object you can use pandas.io.json.json_normalize() method.
Elaborating on #MYGz suggestion to use --jsonArray
Your post doesn't show how you exported the data from mongo. If you use the following via the terminal, you will get valid json from mongodb:
mongoexport --collection=somecollection --db=somedb --jsonArray --out=validfile.json
Replace somecollection, somedb and validfile.json with your target collection, target database, and desired output filename respectively.
The following: mongoexport --collection=somecollection --db=somedb --out=validfile.json...will NOT give you the results you are looking for because:
By default mongoexport writes data using one JSON document for every
MongoDB document. Ref
A bit late reply, and I am not sure it was available the time this question was posted. Anyway, now there is a simple way to import the mongoexport json data as follows:
df = pd.read_json(filename, lines=True)
mongoexport provides each line as a json objects itself, instead of the whole file as json.
I am trying to create a Python script that can take a JSON object and insert it into a headless Couchbase server. I have been able to successfully connect to the server and insert some data. I'd like to be able to specify the path of a JSON object and upsert that.
So far I have this:
from couchbase.bucket import Bucket
from couchbase.exceptions import CouchbaseError
import json
cb = Bucket('couchbase://XXX.XXX.XXX?password=XXXX')
print cb.server_nodes
#tempJson = json.loads(open("myData.json","r"))
try:
result = cb.upsert('healthRec', {'record': 'bob'})
# result = cb.upsert('healthRec', {'record': tempJson})
except CouchbaseError as e:
print "Couldn't upsert", e
raise
print(cb.get('healthRec').value)
I know that the first commented out line that loads the json is incorrect because it is expecting a string not an actual json... Can anyone help?
Thanks!
Figured it out:
with open('myData.json', 'r') as f:
data = json.load(f)
try:
result = cb.upsert('healthRec', {'record': data})
I am looking into using cbdocloader, but this was my first step getting this to work. Thanks!
I know that you've found a solution that works for you in this instance but I thought I'd correct the issue that you experienced in your initial code snippet.
json.loads() takes a string as an input and decodes the json string into a dictionary (or whatever custom object you use based on the object_hook), which is why you were seeing the issue as you are passing it a file handle.
There is actually a method json.load() which works as expected, as you have used in your eventual answer.
You would have been able to use it as follows (if you wanted something slightly less verbose than the with statement):
tempJson = json.load(open("myData.json","r"))
As Kirk mentioned though if you have a large number of json documents to insert then it might be worth taking a look at cbdocloader as it will handle all of this boilerplate code for you (with appropriate error handling and other functionality).
This readme covers the uses of cbdocloader and how to format your data correctly to allow it to load your documents into Couchbase Server.
The structure of file is not important for me so from some previous solution as mentioned "converting them to plain text and importing them with readLines" ,i changed file type from ".doc/.docx" to ".txt" and end up with an error
file_list = list.files("D:/R/New",pattern="*.txt",full.names=F
obj_list <- lapply(file_list,readLines)
Warning messages:
1: In FUN(c("adityar.txt":
incomplete final line found on 'adityar.txt'
I have tried to read with the help of corpus as well but didnt find good result ,here the second solution says about pdf and unix ,any better and fast approach, i am working on windows platform,any help.
Using python , you can do this :
from docx import *
import json
document = opendocx("path_to_your_docx")
res = getdocumenttext(document)
You can save your script and call it from R using system