Importing data from JSON using python

Importing data from JSON using python - python

Hi I am new to python and I am trying to import a Dataset from JSON file in the repository using Python
import json
with open ('dataforms.json','r') as f:
data = json.load(f)
for row in data:
print (row[Flood])
this code is throwing the following error:
Traceback (most recent call last):
File "C:\Users\Ayush\Desktop\js2.py", line 5, in <module>
print (row[Flood])
NameError: name 'Flood' is not defined

I'm assuming Flood is a string? In which case you need to put quotes around it, or Python thinks it is a variable name.
print (row['Flood'])

Related

How do I write the time from datetime to a file in Python?

I'm trying to have my Python code write everything it does to a log, with a timestamp. But it doesn't seem to work.
this is my current code:
filePath= Path('.')
time=datetime.datetime.now()
bot_log = ["","Set up the file path thingy"]
with open ('bot.log', 'a') as f:
f.write('\n'.join(bot_log)%
datetime.datetime.now().strftime("%d-%b-%Y (%H:%M:%S.%f)"))
print(bot_log[0])
but when I run it it says:
Traceback (most recent call last):
File "c:\Users\Name\Yuna-Discord-Bot\Yuna Discord Bot.py", line 15, in <module>
f.write('\n'.join(bot_log)%
TypeError: not all arguments converted during string formatting
I have tried multiple things to fix it, and this is the latest one. is there something I'm doing wrong or missing? I also want the time to be in front of the log message, but I don't think it would do that (if it worked).

You need to put "%s" somewhere in the input string before string formatting. Here's more detailed explanation.
Try this:
filePath= Path('.')
time=datetime.datetime.now()
bot_log = "%s Set up the file path thingy\n"
with open ('bot.log', 'a') as f:
f.write(bot_log % datetime.datetime.now().strftime("%d-%b-%Y (%H:%M:%S.%f)"))
print(bot_log)

It looks like you want to write three strings to your file as separate lines. I've rearranged your code to create a single list to pass to writelines, which expects an iterable:
filePath= Path('.')
time=datetime.datetime.now()
bot_log = ["","Set up the file path thingy"]
with open ('bot.log', 'a') as f:
bot_log.append(datetime.datetime.now().strftime("%d-%b-%Y (%H:%M:%S.%f)"))
f.writelines('\n'.join(bot_log))
print(bot_log[0])
EDIT: From the comments the desire is to prepend the timestamp to the message and keep it on the same line. I've used f-strings as I prefer the clarity they provide:
import datetime
from pathlib import Path
filePath = Path('.')
with open('bot.log', 'a') as f:
time = datetime.datetime.now()
msg = "Set up the file path thingy"
f.write(f"""{time.strftime("%d-%b-%Y (%H:%M:%S.%f)")} {msg}\n""")
You could also look at the logging module which does a lot of this for you.

Manipulating docm files with Python

I was looking how to manipulate docm files with Python and found this library named python-docx-docm.
I followed the documentation and tried a simple programm :
import docx
doc = docx.Document(my docm file)
all_paras = doc.paragraphs
for para in all_paras:
print(para.text)
print("-----------")
To which I am geting the following error :
Traceback (most recent call last):
File "c:\Users\clemdcz\Desktop\Projet\intoWORD.py", line 3, in <module>
doc = docx.Document(r'C:\Users\clemdcz\Desktop\my_file.docm')
File "C:\Users\clemdcz\AppData\Local\Programs\Python\Python310\lib\site-packages\docx\api.py", line 36, in Document
return document_part.document
AttributeError: 'Part' object has no attribute 'document'
If I then try with a docx file it works fine and shows me the correct data. So I was wondering on how to fix this error ?
The documentation doesn't seem to give informations on docm file. But I read that it was supposed to work the same for both docm and docx. I couldn't find any other libraries that could manipulate docx files with python.

Check file before calling SimpleITK.SimpleITK.ImageFileReader.ReadImageInformation()

I am processing a set of DICOM files, some of which have image information and some of which don't. If a file has image information, the following code works fine.
file_reader = sitk.ImageFileReader()
file_reader.SetFileName(fileName)
file_reader.ReadImageInformation()
However, if the file does not have image information, I get the following error.
Traceback (most recent call last):
File "<ipython-input-61-d187aed107ed>", line 5, in <module>
file_reader.ReadImageInformation()
File "/home/peter/anaconda3/lib/python3.7/site-packages/SimpleITK/SimpleITK.py", line 8673, in ReadImageInformation
return _SimpleITK.ImageFileReader_ReadImageInformation(self)
RuntimeError: Exception thrown in SimpleITK ImageFileReader_ReadImageInformation: /tmp/SimpleITK/Code/IO/src/sitkImageReaderBase.cxx:107:
sitk::ERROR: Unable to determine ImageIO reader for "/path/115.dcm"
If the DICOM file has no information, I would like to just ignore the file rather than calling ReadImageInformation(). Is there a way to check whether ReadImageInformation() will work before it is called? I tried the following and they are no different between files where ReadImageInformation() and files where it does not.
file_reader.GetImageIO()
file_reader.GetMetaDataKeys() # Crashes
file_reader.GetDimension()

I would just put an exception handler around it to catch the error. So it'd look something like this:
file_reader = sitk.ImageFileReader()
file_reader.SetFileName(fileName)
try:
file_reader.ReadImageInformation()
except:
print(fileName, "has no image information")

'dict' object has no attribute 'read'

Running Python on a Windows system I encountered issues with loading a JSON file into memory. What is wrong with my code?
>>> import json
>>> array = json.load({"name":"Name","learning objective":"load json files for data analysis"})
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
array = json.load({"name":"Name","learning objective":"load json files for data analysis"})
File "C:\Python34\lib\json\__init__.py", line 265, in load
return loads(fp.read(),
AttributeError: 'dict' object has no attribute 'read'

Since you want to convert it into json format, you should use json.dumps() instead of json.load(). This would work:
>>> import json
>>> array = json.dumps({"name":"Galen","learning objective":"load json files for data analysis"})
>>> array
'{"learning objective": "load json files for data analysis", "name": "Galen"}'
Output:
>>> a = json.loads(array)
>>> a["name"]
u'Galen'

if you want to load json from a string you need to add quotes around your string and there is a different method to read from file or variable. For variable it ends with "s" other doesn't
import json
my_json = '{"my_json" : "value"}'
res = json.loads(my_json)
print res

As you said, it is wrong, you forgot the ' before and after the json text.
import json
array = json.load('{"name":"Galen","learning objective":"load json files for data analysis"}')
I had the same mistake :)
dumps works but it is not the same. Load is better for parsing json.
https://docs.python.org/2/library/json.html

How to overcome memory issue when sequentially appending files to one another

I am running the following script in order to append files to one another by cycling through months and years if the file exists, I have just tested it with a larger dataset where I would expect the output file to be roughly 600mb in size. However I am running into memory issues. Firstly is this normal to run into memory issues (my pc has 8 gb ram) I am not sure how I am eating all of this memory space?
Code I am running
import datetime, os
import StringIO
stored_data = StringIO.StringIO()
start_year = "2011"
start_month = "November"
first_run = False
current_month = datetime.date.today().replace(day=1)
possible_month = datetime.datetime.strptime('%s %s' % (start_month, start_year), '%B %Y').date()
while possible_month <= current_month:
csv_filename = possible_month.strftime('%B %Y') + ' MRG.csv'
if os.path.exists(csv_filename):
with open(csv_filename, 'rb') as current_csv:
if first_run != False:
next(current_csv)
else:
first_run = True
stored_data.writelines(current_csv)
possible_month = (possible_month + datetime.timedelta(days=31)).replace(day=1)
if stored_data:
contents = stored_data.getvalue()
with open('FullMergedData.csv', 'wb') as output_csv:
output_csv.write(contents)
The trackback I receive:
Traceback (most recent call last):
File "C:\code snippets\FullMerger.py", line 23, in <module>
contents = stored_output.getvalue()
File "C:\Python27\lib\StringIO.py", line 271, in getvalue
self.buf += ''.join(self.buflist)
MemoryError
Any ideas how to achieve a work around or make this code more efficient to overcome this issue. Many thanks
AEA
Edit1
Upon running the code supplied alKid I received the following traceback.
Traceback (most recent call last):
File "C:\FullMerger.py", line 22, in <module>
output_csv.writeline(line)
AttributeError: 'file' object has no attribute 'writeline'
I fixed the above by changing it to writelines however I still received the following trace back.
Traceback (most recent call last):
File "C:\FullMerger.py", line 19, in <module>
next(current_csv)
StopIteration

In stored_data, you're trying to store the whole file, and since it's too large, you're getting the error you are showing.
One solution is to write the file per line. It is far more memory-efficient, since you only store a line of data in the buffer, instead of the whole 600 MB.
In short, the structure can be something this:
with open('FullMergedData.csv', 'a') as output_csv: #this will append
# the result to the file.
with open(csv_filename, 'rb') as current_csv:
for line in current_csv: #loop through the lines
if first_run != False:
next(current_csv)
first_run = True #After the first line,
#you should immidiately change first_run to true.
output_csv.writelines(line) #write it per line
Should fix your problem. Hope this helps!

Your memory error is because you store all the data in a buffer before writing it. Consider using something like copyfileobj to directly copy from one open file object to another, this will only buffer small amounts of data at a time. You could also do it line by line, which will have much the same effect.
Update
Using copyfileobj should be much faster than writing the file line by line. Here is an example of how to use copyfileobj. This code opens two files, skips the first line of the input file if skip_first_line is True and then copies the rest of that file to the output file.
skip_first_line = True
with open('FullMergedData.csv', 'a') as output_csv:
with open(csv_filename, 'rb') as current_csv:
if skip_first_line:
current_csv.readline()
shutil.copyfileobj(current_csv, output_csv)
Notice that if you're using copyfileobj you'll want to use current_csv.readline() instead of next(current_csv). That's because iterating over a file object buffers part of the file, which is normally very useful, but you don't want that in this case. More on that here.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Importing data from JSON using python - python

I'm assuming Flood is a string? In which case you need to put quotes around it, or Python thinks it is a variable name. print (row['Flood'])

Related

How do I write the time from datetime to a file in Python?

Manipulating docm files with Python

Check file before calling SimpleITK.SimpleITK.ImageFileReader.ReadImageInformation()

'dict' object has no attribute 'read'

How to overcome memory issue when sequentially appending files to one another

Categories

Resources