Split a huge CSV in three random files in Python [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I have a huge CSV and I want to split it in 3 random files with almost* equal size.
*almost: the size cannot be divided by 3
I was thinking to create 3 blank lists, then in a for loop, I would randomly choose one number between range(0,len(mycsv)) and append it in each list. Then, I will create a csv with the files from the first list and go on. But I think that this will be slow enough. Is there any build-in way or an easier than my own?

For each line of your csv, randomly insert this line in one of three blank csv files. For 100.000 lines, it should not take long.
import random
with open("mycsv.csv") as fr:
with open("1.csv", "w") as f1, open("2.csv", "w") as f2, open("3.csv", "w") as f3:
for line in fr:
f = random.choice([f1, f2, f3])
f.write(line)

Related

While Using CSV Module To Read Out The Columns, In PYTHON, Showing ERROR [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 days ago.
Improve this question
I AM USING LATEST VERSION OF PYTHON.
WHILE I WAS USING THE CSV MODULE, I LOAD THE FILEairports.csv IN IT BUT IT THROWS AN ERROR
Traceback (most recent call last):
File "c:\Users\soumy\OneDrive\Desktop\aatbm\main.py", line 6, in <module>
with open(path, encoding="utf-8") as fp:
NameError: name 'path' is not defined`
BUT MY CSV FILE CONTAINS A HUGE DATA SUCH AS 74673 ROWS AND 15 COLUMNS
SO CAN YOU HELP ME OUT WITH THE CODE GIVEN BELOW:
print('Airport Management System')
import csv
filename = open('airports.csv', 'r')
file = csv.DictReader(filename)
name = []
for col in file:
name.append(col['name'])
print(name)
I WANT TO MAKE A CONSOLE BASED APPLICATION WHERE I CAN DISPLAY THE LIST OF AIRPORTS IN THE WORLD ACCRODING TO THE COUNTRY WHEN PRESSED. BUT IT IS SHOWING ERROR WHILE DISPLAYING THE AIRPORTS NAME.
THE OUTPUT MUST BE NORMAL.
I WAS TRYING FROM THE GEEKFORGREEKS BUT DIDN'T SUCCESS
MUST BE DISPLAY ON PYTHON CONSOLE NOT ON JUPYTER NOTEBOOK

How to replace images path in xlsx with original images using python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 months ago.
Improve this question
I have an excel file where one field has an image path.
below is the sample excel file
I have images in img folder.
How to replace the image column with original images using python(pandas or HTML method or any other method)?
can we use it with file:///[file_name] to get the image in browser and I see with HTML we get the images in excel how to implement with local file.
If all the files are .png, you could include something like this (assuming img/... is a valid folder path:
import xlwings as xw
wb = xw.Book("files.xlsx")
ws = wb.sheets("sheet_name")
# loop through cells in the column, expanding downwards from cell E2
for i, cell in enumerate(ws.range("E2").options(expand="vertical").value):
# add pictures using the file path, and paste starting in column F, same row
ws.pictures.add(cell+".png", anchor=ws.range("F"+str(i+2)))
This appends ".png" to the end of the file path.
You can change the size too, see the documentation.

Efficient way of converting CSV file to XML file? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have written below code to convert a CSV file to an XML file. Input file has 10 millions records. Problem is it runs for hours with 10 millions records. With less number of records like 2000 it takes 5-10 seconds.
Is there a way to do it efficiently in less time?
import csv
import sys
import os
from xml.dom.minidom import Document
filename = sys.argv[1]
filename = os.path.splitext(filename)[0]+'.xml'
pathname = "/tmp/"
output_file = pathname + filename
f = sys.stdin
reader = csv.reader(f)
fields = next(reader)
fields = [x.lower() for x in fields]
fieldsR = fields
doc = Document()
dataRoot = doc.createElement("rowset")
dataRoot.setAttribute('xmlns:xsi', "http://www.w3.org/2001/XMLSchema-instance")
dataRoot.setAttribute('xsi:schemaLocation', "./schema.xsd")
doc.appendChild(dataRoot)
for line in reader:
dataElt = doc.createElement("row")
for i in range(len(fieldsR)):
dataElt.setAttribute(fieldsR[i], line[i])
dataRoot.appendChild(dataElt)
xmlFile = open(output_file,'w')
xmlFile.write(doc.toprettyxml(indent = '\t'))
xmlFile.close()
sys.stdout.write(output_file)
I don't know Python or Minidom, but you seem to be executing the line
dataRoot.appendChild(dataElt)
once for every field in every row, rather than once for every row.
Your performance numbers suggest that something is very wrong here, I don't know if this is it. With 2000 records I would expect to measure the time in milliseconds.
Have to say I'm constantly amazed how people write complex procedural code for this kind of thing when it could be done in half a dozen lines of XSLT or XQuery.

XML retrieving from a URL to CSV [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
Simply, I need to retrieve current currencies and rates from central bank of europe which is in XML format and convert it to CSV file using python. It creates me a file but it does not write the correct things I need.
XML follows here:
https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml?
This is my code but it does not work please help guys.
import xml.etree.ElementTree as ET
import requests
import csv
kurzbanky_xml = requests.get("https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml")
root = ET.fromstring(kurzbanky_xml.text)
with open('banka.csv','w',newline='') as Currency_Rate:
csvwriter = csv.writer(Currency_Rate)
csvwriter.writerow(['currency','rate'])
for member in root.iterfind('Cube'):
cur = cube.attrib['currency']
rat = cube.attrib['rate']
csvwriter.writerow([cur,rat])
You can use xmltodict lib to convert XML to JSON and then iterate over JSON:
import csv
import requests
import xmltodict
r = requests.get("https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml").text
data = xmltodict.parse(r)['gesmes:Envelope']['Cube']['Cube']
with open('{}.csv'.format(data['#time']), 'w', newline='') as f:
csvwriter = csv.writer(f)
csvwriter.writerow(['currency', 'rate'])
for cur in data['Cube']:
csvwriter.writerow([cur['#currency'], cur['#rate']])
Output 2019-03-27.csv file:
currency,rate
USD,1.1261
JPY,124.42
BGN,1.9558
CZK,25.797
DKK,7.4664
GBP,0.85118
etc.

Difference between 2 text files with uknown size [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
In hope to find a solution, I lost couple of days but no success! I have two text files with many lines. One file can contains thousands lines with numbers, for example: 79357795
79357796
68525650.
The second file also contains numbers, but not too much, maybe one hundred of lines (again one number per line). I tried some "algorithms" but no success. Now, my questions is: Can I check first line from first file with all lines from second file, after that, to check second line from first file with all lines from second file and so on up to the end of the file? And as a result, I want to save the difference between this two files in third files. Thank you all for responses and sorry for my baddest english. :)
PS: Oh yes, I need to do this in Python.
More details:
first_file.txt contains:
79790104
79873070
69274656
69180377
60492209
78177852
79023241
69736256
68699620
79577311
78509545
69656007
68188871
60643247
78898817
79924105
79684143
79036022
69445507
60605544
79348181
69748018
69486323
69102802
68651099
second_file.txt contain:
78509545
69656007
68188871
60643247
78898817
79924105
79684143
79036022
69445507
60605544
79348181
69748018
69486323
69102802
68651099
79357794
78953958
69350610
78383111
68629321
78886856
third_file.txt need to contain what number not exist in first_file.txt but exist in second file, in this case:
79357794
78953958
69350610
78383111
68629321
78886856
Something like:
from itertools import ifilterfalse
with open('first') as fst, open('second') as snd, open('not_second', 'w') as fout:
snd_nums = set(int(line) for line in snd)
fst_not_in_snd = ifilterfalse(snd_nums.__contains__, (int(line) for line in fst))
fout.writelines(num + '\n' for num in fst_not_in_snd)
Yes.
Edit: This will give you all numbers that are in both lists (which is what you first asked for.) See other answers for what your data set wants. (I Like 1_CR's answer.)
with open('firstfile.txt') as f:
file1 = f.read().splitlines()
with open('secondfile.txt') as f:
file2 = f.read().splitlines()
for x in file1:
for y in file2:
if x == y:
print "Found matching: " + x
#do what you want here
It could be made more efficient, but the files don't sound that big, and this is the simplest way.
Well, if i were you, I would load those files into two lists, and then iterate through one of them, looking up each value in second one.
If the files are small enough to load into memory, sets are an option
with open('firstfile.txt') as f1, open('second_file.txt') as f2:
print '\n'.join(set(f2.read().splitlines()).difference(f1.read().splitlines()))

Categories

Resources