How to extract information from json? - python

i am trying to extract from json data some information. On the following code, I first extract the part of json data that contains the information i want and i store it in a file. Then i am trying to open this file and i get the error that follows my code. Can you help me find where i am wrong?
import json
import re
input_file = 'path'
text = open(input_file).read()
experience = re.findall(r'Experience":{"positionsMpr":{"showSection":true," (.+?),"visible":true,"find_title":"Find others',text)
output_file = open ('/home/evi.nastou/Documenten/LinkedIn_data/Alewijnse/temp', 'w')
output_file.write('{'+experience[0]+'}')
output_file.close()
text = open('path/temp')
input_text = text.read()
data = json.load(input_text)
positions = json.dumps([s['companyName'] for s in data['positions']])
print positions
Error:
Traceback (most recent call last):
File "test.py", line 13, in <module>
data = json.load(input_text)
File "/home/evi.nastou/.pythonbrew/pythons/Python-2.7.2/lib/python2.7/json/__init__.py", line 274, in load
return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'

You want to use json.loads() (note the s), or pass in the file object instead of the result of .read():
text = open('path/temp')
data = json.load(text)
json.load() takes an open file object, but you were passing it a string; json.loads() takes a string.

Related

Python Configparser. Whitespace causes AttributeError

I recieve some files with .ini file with them. I have to recieve file names from [FILES] section.
Sometimes there is an extra witespace in another section of .ini-file which raises exception in ConfigParser module
The example of "bad" ini-file:
[LETTER]
SUBJECT=some text
some text
and text with whitespace in the beggining
[FILES]
0=file1.txt
1=file2.doc
My code(Python 3.7):
import configparser
def get_files_from_ini_file(info_file):
ini = configparser.ConfigParser(allow_no_value=True)
ini.read(info_file) # ERROR is here
if ini.has_section("FILES"):
pocket_files = [ini.get("FILES", i) for i in ini.options("FILES")]
return pocket_files
print(get_files_from_ini_file("D:\\bad.ini"))
Traceback (most recent call last):
File "D:/test.py", line 10, in <module>
print(get_files_from_ini_file("D:\\bad.ini"))
File "D:/test.py", line 5, in get_files_from_ini_file
ini.read(info_file) # ERROR
File "C:\Users\ap\AppData\Local\Programs\Python\Python37-32\lib\configparser.py", line 696, in read
self._read(fp, filename)
File "C:\Users\ap\AppData\Local\Programs\Python\Python37-32\lib\configparser.py", line 1054, in _read
cursect[optname].append(value)
AttributeError: 'NoneType' object has no attribute 'append'
I can't influence on files I recieve so that is there any way to ignore this error? In fact I need only [FILES] section to parse.
Have tried empty_lines_in_values=False with no result
May be that's invalid ini file and I should write my own parser?
If you only need the "FILES" part, a simple way is to:
open the file and read into a string
get the part after "[FILES]" using .split() method
add "[FILES]" before the string
use the configparser read_string method on the string
This is a hacky solution but it should work:
import configparser
def get_files_from_ini_file(info_file):
with open(info_file, 'r') as file:
ini_string = file.read()
useful_part = "[FILES]" + ini_string.split("[FILES]")[-1]
ini = configparser.ConfigParser(allow_no_value=True)
ini.read_string(useful_part) # ERROR is here
if ini.has_section("FILES"):
pocket_files = [ini.get("FILES", i) for i in ini.options("FILES")]
return pocket_files
print(get_files_from_ini_file("D:\\bad.ini"))

serialize a text file into a protobuf message

I have a serialized protobuf message that I can simply read and save in plain text in python with something like this:
import MyMessage
import sys
FilePath = sys.argv[1]
T = MyMessage.MyType()
f = open(FilePath, 'rb')
T.ParseFromString(f.read())
f.close()
print(T)
I can save this to a plain txt file and do what I want to do.
Now I need to do the inverse operation, i.e. reading the simple plain text file, already formatted in the right way, and save it as a protobuf message
import MyMessage
import sys
FilePath = sys.argv[1]
input = open("./input.txt", 'r')
T = MyMessage.MyType()
T.ParseFrom(inputText.readlines())
output.write(T.SerializeToString())
input.close()
output.close()
This fails with
Traceback (most recent call last):
File "MyFile.py", line 13, in <module>
T.ParseFromString(input.readlines())
File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\google\protobuf\message.py", line 199, in ParseFromString
return self.MergeFromString(serialized)
File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\google\protobuf\internal\python_message.py", line 1142, in MergeFromString
serialized = memoryview(serialized)
TypeError: memoryview: a bytes-like object is required, not 'list'
I am not a python nor a protobuf expert, so I guess I am missing something trivial...
Any help?
Thanks :)
print(x) calls str(x), which for protobufs uses the human-readable "text format" representation.
To read back from that format, you can use the google.protobuf.text_format module:
from google.protobuf import text_format
def parse_my_type(file_path):
with open(file_path, 'r') as f:
return text_format.Parse(f.read(), MyMessage.MyType())

parse Dutch NDW xml

I am trying to parse the XML file from the Dutch NDW which contains every minute the trafficspeed on many Dutch motorways. I use this example file: http://www.ndw.nu/downloaddocument/e838c62446e862f5b6230be485291685/Reistijden.zip
I am trying to parse the traveltime data in variables with Python but i am struggling.
from xml.etree import ElementTree
import urllib2
url = "http://weburloffile.nl/ndw/Reistijden.xml"
response = urllib2.urlopen(url)
namespaces = {
'soap': 'http://schemas.xmlsoap.org/soap/envelope/',
'a': 'http://datex2.eu/schema/2/2_0'
}
dom = ElementTree.fromstring(response.read)
names = dom.findall(
'soap:Envelope'
'/a:duration',
namespaces,
)
#print names
for duration in names:
print(duration.text)
I get this new error
Traceback (most recent call last):
File "test.py", line 9, in <module>
dom = ElementTree.fromstring(response.read)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1311, in XML
parser.feed(text)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1651, in feed
self._parser.Parse(data, 0)
TypeError: Parse() argument 1 must be string or read-only buffer, not instancemethod
How to parse this (complex) xml correctly?
-- changed it into read as suggested by comment
The problem isn't the XML parsing; it's that you are using the response object incorrectly. urllib2.urlopen returns a file-like object that does not have a content attribute. Instead, you should be calling read on it:
dom = ElementTree.fromstring(response.read())

Python - write Xml (formatted)

I wrote this python script in order to create Xml content and i would like to write this "prettified" xml to a file (50% done):
My script so far:
data = ET.Element("data")
project = ET.SubElement(data, "project")
project.text = "This project text"
rawString = ET.tostring(data, "utf-8")
reparsed = xml.dom.minidom.parseString(rawString)
cleanXml = reparsed.toprettyxml(indent=" ")
# This prints the prettified xml i would like to save to a file
print cleanXml
# This part does not work, the only parameter i can pass is "data"
# But when i pass "data" as a parameter, a xml-string is written to the file
tree = ET.ElementTree(cleanXml)
tree.write("config.xml")
The error i get when i pass cleanXml as parameter:
Traceback (most recent call last):
File "app.py", line 45, in <module>
tree.write("config.xml")
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 817, in write
self._root, encoding, default_namespace
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 876, in _namespaces
iterate = elem.getiterator # cET compatibility
AttributeError: 'unicode' object has no attribute 'getiterator'
Anybody knows how i can get my prettified xml to a file ?
Thanks and Greetings!
The ElementTree constructor can be passed a root element and a file. To create an ElementTree from a string, use ElementTree.fromstring.
However, that isn't what you want. Just open a file and write the string directly:
with open("config.xml", "w") as config_file:
config_file.write(cleanXml)

reading a file with json data with python throw an error that I cannot identify

I have the following json file named json.txt with the following data,
{"id":99903727,"nickname":"TEST_MLA_OFF","registration_date":"2010-12-03T14:19:33.000-04:00","country_id":"AR","user_type":"normal","logo":null,"points":0,"site_id":"MLA","permalink":"http://perfil.mercadolibre.com.ar/TEST_MLA_OFF","seller_reputation":{"level_id":null,"power_seller_status":null,"transactions":{"period":"12 months","total":25,"completed":25,"canceled":0,"ratings":{"positive":0,"negative":0,"neutral":1}}},"status":{"site_status":"deactive"}}
I obtained it using wget. I tried to load that json data with python using the following python code,
json_data = json.load('json.txt')
data = json.load(json_data)
json_data.close()
print data
but that throws the following error,
Traceback (most recent call last):
File "json-example.py", line 28, in <module>
main()
File "json-example.py", line 21, in main
json_data = json.load('json.txt')
File "/opt/sage-4.6.2-linux-64bit-ubuntu_8.04.4_lts-x86_64-Linux/local/lib/python/json/__init__.py", line 264, in load
return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'
I couldn't find googling what is the reason of the error.
Best regards.
Even better practice is to use the with statement.
with open('json.txt', 'r') as json_file:
data = json.load(json_file)
This makes sure the file gets closed properly without
you worrying about it.
You need to give json.load a file stream object:
json_file = open('json.txt')
data = json.load(json_file)
json_file.close()
print data

Categories

Resources