Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have written below code to convert a CSV file to an XML file. Input file has 10 millions records. Problem is it runs for hours with 10 millions records. With less number of records like 2000 it takes 5-10 seconds.
Is there a way to do it efficiently in less time?
import csv
import sys
import os
from xml.dom.minidom import Document
filename = sys.argv[1]
filename = os.path.splitext(filename)[0]+'.xml'
pathname = "/tmp/"
output_file = pathname + filename
f = sys.stdin
reader = csv.reader(f)
fields = next(reader)
fields = [x.lower() for x in fields]
fieldsR = fields
doc = Document()
dataRoot = doc.createElement("rowset")
dataRoot.setAttribute('xmlns:xsi', "http://www.w3.org/2001/XMLSchema-instance")
dataRoot.setAttribute('xsi:schemaLocation', "./schema.xsd")
doc.appendChild(dataRoot)
for line in reader:
dataElt = doc.createElement("row")
for i in range(len(fieldsR)):
dataElt.setAttribute(fieldsR[i], line[i])
dataRoot.appendChild(dataElt)
xmlFile = open(output_file,'w')
xmlFile.write(doc.toprettyxml(indent = '\t'))
xmlFile.close()
sys.stdout.write(output_file)
I don't know Python or Minidom, but you seem to be executing the line
dataRoot.appendChild(dataElt)
once for every field in every row, rather than once for every row.
Your performance numbers suggest that something is very wrong here, I don't know if this is it. With 2000 records I would expect to measure the time in milliseconds.
Have to say I'm constantly amazed how people write complex procedural code for this kind of thing when it could be done in half a dozen lines of XSLT or XQuery.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 19 hours ago.
Improve this question
I am trying to pull values of all pin and p_status from the below sample code into another file:
{'exchange':'vs',
'msg_count':660,
'payload': {"allocation":"645985649","p_status":"ordered","pin":"323232134455","bytes":"998"},
{'exchange':'vse',
So far I am only able to load and pretty print my json file using below code:
import json
import pprint
file = open('/Users/sjain34/Downloads/jsonviewer.json','r')
data = json.load(file)
my_data= pprint.pprint(data)
print(my_data)
Try this:
import pandas as pd
Df = pd.read_json('/Users/sjain34/Downloads/jsonviewer.json','r')
preferred_df = Df["p_status":"pin"]
preferred_df.to_csv('file_name.csv')
This would serve a csv with just the statuses and pins
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 days ago.
Improve this question
I AM USING LATEST VERSION OF PYTHON.
WHILE I WAS USING THE CSV MODULE, I LOAD THE FILEairports.csv IN IT BUT IT THROWS AN ERROR
Traceback (most recent call last):
File "c:\Users\soumy\OneDrive\Desktop\aatbm\main.py", line 6, in <module>
with open(path, encoding="utf-8") as fp:
NameError: name 'path' is not defined`
BUT MY CSV FILE CONTAINS A HUGE DATA SUCH AS 74673 ROWS AND 15 COLUMNS
SO CAN YOU HELP ME OUT WITH THE CODE GIVEN BELOW:
print('Airport Management System')
import csv
filename = open('airports.csv', 'r')
file = csv.DictReader(filename)
name = []
for col in file:
name.append(col['name'])
print(name)
I WANT TO MAKE A CONSOLE BASED APPLICATION WHERE I CAN DISPLAY THE LIST OF AIRPORTS IN THE WORLD ACCRODING TO THE COUNTRY WHEN PRESSED. BUT IT IS SHOWING ERROR WHILE DISPLAYING THE AIRPORTS NAME.
THE OUTPUT MUST BE NORMAL.
I WAS TRYING FROM THE GEEKFORGREEKS BUT DIDN'T SUCCESS
MUST BE DISPLAY ON PYTHON CONSOLE NOT ON JUPYTER NOTEBOOK
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 days ago.
Improve this question
import aspose.words as aw
import os
import glob
import openpyxl
import json
import aspose.cells
from aspose.cells import License,Workbook,FileFormatType
workbook = Workbook("bookwithChart.xlsx")
os.chdir(os.path.join(r"C:\Users\13216\Downloads\pythontests"))
docx_files = glob.glob("*.docx")
for files in docx_files:
doc = aw.Document(files)
doc.save("document1.docx")
doc = aw.Document("document1.docx")
doc.save("html_output.html", aw.SaveFormat.HTML)
book = Workbook("html_output.html")
book.save("word-to-json.json", FileFormatType.JSON)
I need to convert a set of WORD documents to JSON. It works great. However, when I change the PATH and document name to test out other documents, the output doesn't chain. The JSON file returns the same output shown in the initial test document.
I tried changing the save command for the JSON and HTML files. It didn't work. I assume the program is storing the first output from the very first test document ("document1.docx"). I tried inputting a different document so many times. The output does not change.
There is no need to open/save DOCX document in your code. Also, you save all the documents into the same output file, so it is overridden on each iteration. You can modify your code like this:
import aspose.words as aw
import aspose.cells as ac
import os
import glob
os.chdir(os.path.join(r"C:\Temp"))
docx_files = glob.glob("*.docx")
i = 0
for files in docx_files:
doc = aw.Document(files)
doc.save("tmp.html", aw.SaveFormat.HTML)
book = ac.Workbook("tmp.html")
book.save("word-to-json_" + str(i) + ".json", ac.FileFormatType.JSON)
i += 1
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
Simply, I need to retrieve current currencies and rates from central bank of europe which is in XML format and convert it to CSV file using python. It creates me a file but it does not write the correct things I need.
XML follows here:
https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml?
This is my code but it does not work please help guys.
import xml.etree.ElementTree as ET
import requests
import csv
kurzbanky_xml = requests.get("https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml")
root = ET.fromstring(kurzbanky_xml.text)
with open('banka.csv','w',newline='') as Currency_Rate:
csvwriter = csv.writer(Currency_Rate)
csvwriter.writerow(['currency','rate'])
for member in root.iterfind('Cube'):
cur = cube.attrib['currency']
rat = cube.attrib['rate']
csvwriter.writerow([cur,rat])
You can use xmltodict lib to convert XML to JSON and then iterate over JSON:
import csv
import requests
import xmltodict
r = requests.get("https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml").text
data = xmltodict.parse(r)['gesmes:Envelope']['Cube']['Cube']
with open('{}.csv'.format(data['#time']), 'w', newline='') as f:
csvwriter = csv.writer(f)
csvwriter.writerow(['currency', 'rate'])
for cur in data['Cube']:
csvwriter.writerow([cur['#currency'], cur['#rate']])
Output 2019-03-27.csv file:
currency,rate
USD,1.1261
JPY,124.42
BGN,1.9558
CZK,25.797
DKK,7.4664
GBP,0.85118
etc.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I have a huge CSV and I want to split it in 3 random files with almost* equal size.
*almost: the size cannot be divided by 3
I was thinking to create 3 blank lists, then in a for loop, I would randomly choose one number between range(0,len(mycsv)) and append it in each list. Then, I will create a csv with the files from the first list and go on. But I think that this will be slow enough. Is there any build-in way or an easier than my own?
For each line of your csv, randomly insert this line in one of three blank csv files. For 100.000 lines, it should not take long.
import random
with open("mycsv.csv") as fr:
with open("1.csv", "w") as f1, open("2.csv", "w") as f2, open("3.csv", "w") as f3:
for line in fr:
f = random.choice([f1, f2, f3])
f.write(line)