Python csv - fit cells' size to fit strings' length? - python

I'm writing a dictionary into a csv file using Python's csv library and a DictWriter object like so:
with open('test.csv', 'w', newline='') as fp:
fieldnames = [<fieldnames for csv DictWriter>]
dict_writer = csv.DictWriter(fp, fieldnames=fieldnames)
dict_writer.writeheader()
dict_writer.writerow(<some dictionary here>)
The result is that if i have a long name as one of the dictionary's values the cells look like this:
But i want it to look like this:
Is there a way to fit the cells' size to the string lengths ? I imagine this is a relatively common issue since you don't want to manually do it each time you open a csv file.
How can i fit it beforehand ?

A CSV does not have formatting information. It only have data. It is up to the viewer configuration how to resize the cells.
If you really need formatting, you should write in the file format of the receiving application (.ods, .xls, .xlsx...).

Related

Array in a specific format in csv file using Python

I am trying to save the array inv_r in a specific format in a csv file. The current and the desired formats are attached.
import numpy as np
import csv
inv_r=np.array([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],
[16,17,18,19,20],
[21,22,23,24,25]])
data = [inv_r]
with open('inv_r.csv', 'w') as f:
writer = csv.writer(f)
# write the data
writer.writerows(zip(inv_r))
The current format is
The desired format is
You need to remove the zip, first of all, so that each element get its own cell.
To remove the blank lines between each row, you need to add newline='' to the open call. See the footnote in the documentation for the csv library here for an explanation as to why this is necessary.

Import pipe delimited txt file into spark dataframe in databricks

I have a data file saved as .txt format which has a header row at the top, and is pipe delimited. I am working in databricks, and am needing to create a spark dataframe of this data, with all columns read in as StringType(), the headers defined by the first row, and the columns separated based on the pipe delimiter.
When importing .csv files I am able to set the delimiter and header options. However, I am not able to get the .txt files to import in the same way.
Example Data (completely made up)... for ease, please imagine it is just called datafile.txt:
URN|Name|Supported
12233345757777701|Tori|Yes
32313185648456414|Dave|No
46852554443544854|Steph|No
I would really appreciate a hand in getting this imported into a Spark dataframe so that I can crack on with other parts of the analysis. Thank you!
Any delimiter separated file is a good candidate for csv reading methods. The 'c' of csv is mostly by convention. Thus nothing stops us from reading this:
col1|col2|col3
0|1|2
1|3|8
Like this (in pure python):
import csv
from pathlib import Path
with Path("pipefile.txt").open() as f:
reader = csv.DictReader(f, delimiter="|")
data = list(reader)
print(data)
Since whatever custom reader your libraries are using probably uses csv.reader under the hood you simply need to figure out how to pass the right separator to it.
#blackbishop notes in a comment that
spark.read.csv("datafile.text", header=True, sep="|")
would be the appropriate spark call.

How can I open a csv file in python, and read one line at a time, without loading the whole csv file in memory?

I have a csv file of size that would not fit in the memory of my machine. So I want to open the csv file and then read it's rows one at a time. I basically want to make a python generator that yields single rows from the csv.
Thanks in advance! :)
with open(filename, "r") as file:
for line in file:
doanything()
Python is lazy whenever possible. File objects are generators and do not load the entire file but only one line at a time.
Solution:
You can use chunksize param available in pandas read_csv function
chunksize = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunksize):
print(type(chunk))
# CODE HERE
set chunksize to 1 and it should take care of your problem statement.
My personal preference for doing this is with csv.DictReader
You set it up as an object, with pointers/parameters, and then to access the file one row at a time, you just iterate over it with next and it returns a dictionary containing the named field key, value pairs in your csv file.
e.g.
import csv
csvfile = open('names.csv')
my_reader = csv.DictReader(csvfile)
first_row = next(my_reader)
for row in my_reader:
print ( [(k,v) for k,v in row.items() ] )
csvfile.close()
See the linked docs for parameter usage etc - it's fairly straightforward.
python generator that yields single rows from the csv.
This sounds like you want csv.reader from built-in csv module. You will get one list for each line in file.

trying to change delimiters in large csv files resulting in memory errors (python 3)

I have a bunch of comma-delimited files that I am trying to change to pipe-delimited files.
I am following the example provided here: Python CSV change separator
Here is my code:
print("setting new delimiter...")
reader = list(csv.reader(open(localfile, "rU"), delimiter=','))
writer = csv.writer(open(localfile, 'w'), delimiter='|', lineterminator='\n')
writer.writerows(row for row in reader)
I can't tell if memory usage is cumulative, or because of a specific filesize, but either way, on my third file, I get a memory error.
since the third file is nearly the same size as the first two, it appears cumulative.
Is there a better way in doing this? Thanks

How to export data (which is as result of Python program) from command line?

I am working on a Python program, and I have results on the command line.
Now I need to do analysis on the results, so I need all results as exported in any format like either SQL, or Excel or CSV format.
Can some tell me how can i do that ?
import csv
x1=1 x2=2
while True:
show = [ dict(x1=x1+1 , x2=x2+2)]
print('Received', show )
with open('large1.csv','w') as f1:
writer=csv.writer(f1, delimiter=' ',lineterminator='\n\n',)
writer.writerow(show)
x1=x1+1
x2=x2+1
Here this is infinite loop and I want to have a csv file containing 2 column of x1 and x2. and with regularly updated all values of x1 and x2 row wise (1 row for 1 iteration)
But by this code I'm getting a csv file which is named as 'large1.csv' and containing only one row (last updated values of x1 and x2).
So how can I get my all values of x1 and x2 as row was in python.
Just use the csv format it can easily imported into Excel and
the python standard library supports csv out of the box. #See python-csv
One way to handle this is to use the CSV module, and specifically a Writer object to write the output to a CSV file (perhaps even instead of writing to stdout). The documentation has several examples, including this one:
import csv
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(someiterable)
You should then be able to import the CSV file easily in Excel if that is what you want.
Have a look at the open() documentation.
Mode w will truncate the file, which means it will replace the contents. Since you call that every loop iteration, you are continuously deleting the file and replacing it with a new one. Mode a appends to the file and is maybe what you want. You also might consider opening the file outside of the loop, in that case w might be the correct mode.

Categories

Resources