I write into a csv by this function:
def write_csv(hlavicka: Tuple[str, ...], zaznam: list, pomocne_csv: str) -> None:
if not os.path.isfile(pomocne_csv):
with open(pomocne_csv, "w", encoding=cfg.ENCODING, newline="") as soubor:
writer = csv.writer(soubor, delimiter=cfg.DELIMITER)
writer.writerow(hlavicka)
with open(pomocne_csv, "a", encoding=cfg.ENCODING, newline="") as soubor:
writer = csv.writer(soubor, delimiter=cfg.DELIMITER)
writer.writerows([zaznam])
However, when I open the csv in MS Office, I see that long numbers are in the scientific notation. For example 102043292003060000 is displayed as 1.02E+17. Of course, I put 102043292003060000 into my write_csv() function.
The problem is that when I read the csv using:
def generuj_zaznamy(input_path):
with open(input_path, "r", encoding="cp1250") as file_object:
reader = csv.reader(file_object, delimiter=";")
for entry in enumerate(reader, start=1):
print(entry)
I got 1.02E+17 instead of 102043292003060000.
Is there a way how to format the cell as a number directly in csv.writer or csv.reader? Thanks a lot.
Using the text editor like notepad.exe to open the csv file, you should see the value of a long numbers accurately. So, the problem comes from office excel but not csv.writer.
If you want to see the long numbers accurately from csv file, you should create a new xlsx file and use the function(Data->Get External Data->From text) to select the csv file for importing, and then choose the data format of the column as Text.
Edited:
I tried the code and it seems that the problem also happens to pandas.DataFrame.to_csv() but not only csv.writer() when the length of the number comes to 20 or more, which is out of the range of np.int64.
I readed the offical document and seems that float_format arg can't solve this problems.
The solution I can give now is here, if you can read the original data in string format for the length of the number more than 20:
import numpy as np
import pandas as pd
import csv
df = pd.DataFrame(["3100000035155588379531799826432", "3100000035155588433002733375488", "3100000035155588355694446120960"])
df = "\t" + df
print(df)
df.to_csv("test.csv", index=False, header=False)
rng = np.random.default_rng(0)
big_nums = rng.random(10) * (10**19) # OverflowError while comes to 10**20
df = pd.DataFrame(big_nums, dtype=np.int64).astype(str)
# df = "\t" + df
print(df)
df.to_csv("test.csv", index=False, header=False)
and the output will like that:
0
0 \t3100000035155588379531799826432
1 \t3100000035155588433002733375488
2 \t3100000035155588355694446120960
0
0 6369616873214542848
1 2697867137638703104
2 409735239361946880
3 165276355285290944
4 8132702392002723840
5 9127555772777217024
6 6066357757671798784
7 7294965609839983616
8 5436249914654228480
9 -9223372036854775808
Related
Hi i'm trying to convert .dat file to .csv file.
But I have a problem with it.
I have a file .dat which looks like(column name)
region GPS name ID stop1 stop2 stopname1 stopname2 time1 time2 stopgps1 stopgps2
it delimiter is a tab.
so I want to convert dat file to csv file.
but the data keeps coming out in one column.
i try to that, using next code
import pandas as pd
with open('file.dat', 'r') as f:
df = pd.DataFrame([l.rstrip() for l in f.read().split()])
and
with open('file.dat', 'r') as input_file:
lines = input_file.readlines()
newLines = []
for line in lines:
newLine = line.strip('\t').split()
newLines.append(newLine)
with open('file.csv', 'w') as output_file:
file_writer = csv.writer(output_file)
file_writer.writerows(newLines)
But all the data is being expressed in one column.
(i want to express 15 column, 80,000 row, but it look 1 column, 1,200,000 row)
I want to convert this into a csv file with the original data structure.
Where is a mistake?
Please help me... It's my first time dealing with data in Python.
If you're already using pandas, you can just use pd.read_csv() with another delimiter:
df = pd.read_csv("file.dat", sep="\t")
df.to_csv("file.csv")
See also the documentation for read_csv and to_csv
Hello everyone I am learning python I am new I have a column in a csv file with this example of value:
I want to divide the column programme based on that semi column into two columns for example
program 1: H2020-EU.3.1.
program 2: H2020-EU.3.1.7.
This is what I wrote initially
import csv
import os
with open('IMI.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
with open('new_IMI.csv', 'w') as new_file:
csv_writer = csv.writer(new_file, delimiter='\t')
#for line in csv_reader:
# csv_writer.writerow(line)
please note that after i do the split of columns I need to write the file again as a csv and save it to my computer
Please guide me
Using .loc to iterate through each row of a dataframe is somewhat inefficient. Better to split an entire column, with the expand=True to assign to the new columns. Also as stated, easy to use pandas here:
Code:
import pandas as pd
df = pd.read_csv('IMI.csv')
df[['programme1','programme2']] = df['programme'].str.split(';', expand=True)
df.drop(['programme'], axis=1, inplace=True)
df.to_csv('IMI.csv', index=False)
Example of output:
Before:
print(df)
id acronym status programme topics
0 945358 BIGPICTURE SIGNED H2020-EU.3.1.;H2020-EU3.1.7 IMI2-2019-18-01
1 821362 EBiSC2 SIGNED H2020-EU.3.1.;H2020-EU3.1.7 IMI2-2017-13-06
2 116026 HARMONY SIGNED H202-EU.3.1. IMI2-2015-06-04
After:
print(df)
id acronym status topics programme1 programme2
0 945358 BIGPICTURE SIGNED IMI2-2019-18-01 H2020-EU.3.1. H2020-EU3.1.7
1 821362 EBiSC2 SIGNED IMI2-2017-13-06 H2020-EU.3.1. H2020-EU3.1.7
2 116026 HARMONY SIGNED IMI2-2015-06-04 H2020-EU.3.1. None
You can use pandas library instead of csv.
import pandas as pd
df = pd.read_csv('IMI.csv')
p1 = {}
p2 = {}
for i in range(len(df)):
if ';' in df['programme'].loc[i]:
p1[df['id'].loc[i]] = df['programme'].loc[i].split(';')[0]
p2[df['id'].loc[i]] = df['programme'].loc[i].split(';')[1]
df['programme1'] = df['id'].map(p1)
df['programme2'] = df['id'].map(p2)
and if you want to delete programme column:
df.drop('programme', axis=1)
To save new csv file:
df.to_csv('new_file.csv', inplace=True)
I want to know if it is possible to use the pandas to_csv() function to add a dataframe to an existing csv file. The csv file has the same structure as the loaded data.
You can specify a python write mode in the pandas to_csv function. For append it is 'a'.
In your case:
df.to_csv('my_csv.csv', mode='a', header=False)
The default mode is 'w'.
If the file initially might be missing, you can make sure the header is printed at the first write using this variation:
output_path='my_csv.csv'
df.to_csv(output_path, mode='a', header=not os.path.exists(output_path))
You can append to a csv by opening the file in append mode:
with open('my_csv.csv', 'a') as f:
df.to_csv(f, header=False)
If this was your csv, foo.csv:
,A,B,C
0,1,2,3
1,4,5,6
If you read that and then append, for example, df + 6:
In [1]: df = pd.read_csv('foo.csv', index_col=0)
In [2]: df
Out[2]:
A B C
0 1 2 3
1 4 5 6
In [3]: df + 6
Out[3]:
A B C
0 7 8 9
1 10 11 12
In [4]: with open('foo.csv', 'a') as f:
(df + 6).to_csv(f, header=False)
foo.csv becomes:
,A,B,C
0,1,2,3
1,4,5,6
0,7,8,9
1,10,11,12
with open(filename, 'a') as f:
df.to_csv(f, header=f.tell()==0)
Create file unless exists, otherwise append
Add header if file is being created, otherwise skip it
A little helper function I use with some header checking safeguards to handle it all:
def appendDFToCSV_void(df, csvFilePath, sep=","):
import os
if not os.path.isfile(csvFilePath):
df.to_csv(csvFilePath, mode='a', index=False, sep=sep)
elif len(df.columns) != len(pd.read_csv(csvFilePath, nrows=1, sep=sep).columns):
raise Exception("Columns do not match!! Dataframe has " + str(len(df.columns)) + " columns. CSV file has " + str(len(pd.read_csv(csvFilePath, nrows=1, sep=sep).columns)) + " columns.")
elif not (df.columns == pd.read_csv(csvFilePath, nrows=1, sep=sep).columns).all():
raise Exception("Columns and column order of dataframe and csv file do not match!!")
else:
df.to_csv(csvFilePath, mode='a', index=False, sep=sep, header=False)
Initially starting with a pyspark dataframes - I got type conversion errors (when converting to pandas df's and then appending to csv) given the schema/column types in my pyspark dataframes
Solved the problem by forcing all columns in each df to be of type string and then appending this to csv as follows:
with open('testAppend.csv', 'a') as f:
df2.toPandas().astype(str).to_csv(f, header=False)
This is how I did it in 2021
Let us say I have a csv sales.csv which has the following data in it:
sales.csv:
Order Name,Price,Qty
oil,200,2
butter,180,10
and to add more rows I can load them in a data frame and append it to the csv like this:
import pandas
data = [
['matchstick', '60', '11'],
['cookies', '10', '120']
]
dataframe = pandas.DataFrame(data)
dataframe.to_csv("sales.csv", index=False, mode='a', header=False)
and the output will be:
Order Name,Price,Qty
oil,200,2
butter,180,10
matchstick,60,11
cookies,10,120
A bit late to the party but you can also use a context manager, if you're opening and closing your file multiple times, or logging data, statistics, etc.
from contextlib import contextmanager
import pandas as pd
#contextmanager
def open_file(path, mode):
file_to=open(path,mode)
yield file_to
file_to.close()
##later
saved_df=pd.DataFrame(data)
with open_file('yourcsv.csv','r') as infile:
saved_df.to_csv('yourcsv.csv',mode='a',header=False)`
I have a csv file, which has only a single column , which acts as my input.
I use that input to find my outputs. I have multiple outputs and I need those outputs in another csv file.
Can anyone please suggest me the ways on how to do it ?
Here is the code :
import urllib.request
jd = {input 1}
//
Some Codes to find output - a,b,c,d,e
//
** Code to write output to a csv file.
** Repeat the code with next input of input csv file.
Input CSV File has only a single column and is represented below:
1
2
3
4
5
Output would in a separate csv in a given below format :
It would be in multiple rows and multiple columns format.
a b c d e
Here is a simple example:
The data.csv is a csv with one column and multiple rows.
The results.csv contain the mean and median of the input and is a csv with 1 row and 2 columns (mean is in 1st column and median in 2nd column)
Example:
import numpy as np
import pandas as pd
import csv
#load the data
data = pd.read_csv("data.csv", header=None)
#calculate things for the 1st column that has the data
calculate_mean = [np.mean(data.loc[:,0])]
calculate_median = [np.median(data.loc[:,0])]
results = [calculate_mean, calculate_median]
#write results to csv
row = []
for result in results:
row.append(result)
with open("results.csv", "wb") as file:
writer = csv.writer(file)
writer.writerow(row)
In pseudo code, you'll do something like this:
for each_file in a_folder_that_contains_csv: # go through all the `inputs` - csv files
with open(each_file) as csv_file, open(other_file) as output_file: # open each csv file, and a new csv file
process_the_input_from_each_csv # process the data you read from the csv_file
export_to_output_file # export the data to the new csv file
Now, I won't write a full-working example because it's better for you to start digging and ask specific questions when you have some. You're now just asking: write this for me because I don't know Python.
here is the official documentation
here you can read about the csv module
here you can read about the os module
I think you need read_csv for reading file to Series and to_csv for writing output Series to file in looping by Series.iteritems.
#file content
1
3
5
s = pd.read_csv('file', squeeze=True, names=['a'])
print (s)
0 1
1 3
2 5
Name: a, dtype: int64
for i, val in s.iteritems():
#print (val)
#some operation with scalar value val
df = pd.DataFrame({'a':np.arange(val)})
df['a'] = df['a'] * 10
print (df)
#write to csv, file name by val
df.to_csv(str(val) + '.csv', index=False)
a
0 0
a
0 0
1 10
2 20
a
0 0
1 10
2 20
3 30
4 40
I want to know if it is possible to use the pandas to_csv() function to add a dataframe to an existing csv file. The csv file has the same structure as the loaded data.
You can specify a python write mode in the pandas to_csv function. For append it is 'a'.
In your case:
df.to_csv('my_csv.csv', mode='a', header=False)
The default mode is 'w'.
If the file initially might be missing, you can make sure the header is printed at the first write using this variation:
output_path='my_csv.csv'
df.to_csv(output_path, mode='a', header=not os.path.exists(output_path))
You can append to a csv by opening the file in append mode:
with open('my_csv.csv', 'a') as f:
df.to_csv(f, header=False)
If this was your csv, foo.csv:
,A,B,C
0,1,2,3
1,4,5,6
If you read that and then append, for example, df + 6:
In [1]: df = pd.read_csv('foo.csv', index_col=0)
In [2]: df
Out[2]:
A B C
0 1 2 3
1 4 5 6
In [3]: df + 6
Out[3]:
A B C
0 7 8 9
1 10 11 12
In [4]: with open('foo.csv', 'a') as f:
(df + 6).to_csv(f, header=False)
foo.csv becomes:
,A,B,C
0,1,2,3
1,4,5,6
0,7,8,9
1,10,11,12
with open(filename, 'a') as f:
df.to_csv(f, header=f.tell()==0)
Create file unless exists, otherwise append
Add header if file is being created, otherwise skip it
A little helper function I use with some header checking safeguards to handle it all:
def appendDFToCSV_void(df, csvFilePath, sep=","):
import os
if not os.path.isfile(csvFilePath):
df.to_csv(csvFilePath, mode='a', index=False, sep=sep)
elif len(df.columns) != len(pd.read_csv(csvFilePath, nrows=1, sep=sep).columns):
raise Exception("Columns do not match!! Dataframe has " + str(len(df.columns)) + " columns. CSV file has " + str(len(pd.read_csv(csvFilePath, nrows=1, sep=sep).columns)) + " columns.")
elif not (df.columns == pd.read_csv(csvFilePath, nrows=1, sep=sep).columns).all():
raise Exception("Columns and column order of dataframe and csv file do not match!!")
else:
df.to_csv(csvFilePath, mode='a', index=False, sep=sep, header=False)
Initially starting with a pyspark dataframes - I got type conversion errors (when converting to pandas df's and then appending to csv) given the schema/column types in my pyspark dataframes
Solved the problem by forcing all columns in each df to be of type string and then appending this to csv as follows:
with open('testAppend.csv', 'a') as f:
df2.toPandas().astype(str).to_csv(f, header=False)
This is how I did it in 2021
Let us say I have a csv sales.csv which has the following data in it:
sales.csv:
Order Name,Price,Qty
oil,200,2
butter,180,10
and to add more rows I can load them in a data frame and append it to the csv like this:
import pandas
data = [
['matchstick', '60', '11'],
['cookies', '10', '120']
]
dataframe = pandas.DataFrame(data)
dataframe.to_csv("sales.csv", index=False, mode='a', header=False)
and the output will be:
Order Name,Price,Qty
oil,200,2
butter,180,10
matchstick,60,11
cookies,10,120
A bit late to the party but you can also use a context manager, if you're opening and closing your file multiple times, or logging data, statistics, etc.
from contextlib import contextmanager
import pandas as pd
#contextmanager
def open_file(path, mode):
file_to=open(path,mode)
yield file_to
file_to.close()
##later
saved_df=pd.DataFrame(data)
with open_file('yourcsv.csv','r') as infile:
saved_df.to_csv('yourcsv.csv',mode='a',header=False)`