My code writes a txt file with lots of data. I'm trying to print a pandas dataframe into that txt file as part of my code but can't use .write() as that only accepts strings.
How do I take a pandas dataframe, stored as DF1 for example, and print it in the file?
I've seen similar questions but those are aimed at creating a txt file solely for the dataframe, I would just like my dataframe to appear in a txt file
use the to_string method, and then you can use write with the mode set to append ('a')
tfile = open('test.txt', 'a')
tfile.write(df.to_string())
tfile.close()
Sample Data
import pandas as pd
import numpy as np
df = pd.DataFrame({'id': np.arange(1,6,1),
'val': list('ABCDE')})
test.txt
This line of text was here before.
Code
tfile = open('test.txt', 'a')
tfile.write(df.to_string())
tfile.close()
Output: test.txt
This line of text was here before.
id val
0 1 A
1 2 B
2 3 C
3 4 D
4 5 E
Pandas DataFrames have to_string(), to_json() and to_csv() methods that may be helpful to you, see:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_string.html
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html
Example of writing a text file to a string. Use 'w' flag to write and 'a' to append to a file.
example_string = df1.to_string()
output_file = open('file.txt','a')
output_file.write(example_string)
output_file.close()
If you are only looking to put certain information in the text file, you can do that using either pandas or json methods to select it, etc. and see the docs links above as well.
Before OP commented about appending I originally wrote an example about json. json supports a dump() method to help write to a file. However, in most cases, its not the most ideal format to keep appending output to vs. csv or txt. In case its useful to anyone:
import json
filename = 'file.json'
with open(filename, 'w') as file:
json.dump(df1.to_json(), file)
Related
Python 3.8.5 Pandas 1.1.3
I'm using the following to loop through json files and create csv files:
import os
import glob
impot pandas as pd
def stuff():
results_list = []
for filepath in glob.iglob('/Users/me/data/*.json'):
filename = str(filepath)
file = open(filepath,"r")
data = file.read()
df = pd.json_normalize(data, 'main')
df.to_csv(filename + '.csv')
file.close()
results_list.append(data)
return results_list
The format of the resulting csv files fits my requirements exactly without having to pass any additional params to the to_csv method - when viewing the csv file in Excel, row 1 is the keys as the headers, and column 1 is the index numbers. Exactly what I need. Cell A1 is blank.
One final step that I need to accomplish is to write the filename variable value to the csv file. Ideally I'd like to put it in cell A1, if possible. Can I accomplish this solely with to_csv or am I going to need to get into csv.writer world?
You can exploit the index name for that purpose:
df.rename_axis('somename').to_csv()
Trying to whip this out in python. Long story short I got a csv file that contains column data i need to inject into another file that is pipe delimited. My understanding is that python can't replace values, so i have to re-write the whole file with the new values.
data file(csv):
value1,value2,iwantthisvalue3
source file(txt, | delimited)
value1|value2|iwanttoreplacethisvalue3|value4|value5|etc
fixed file(txt, | delimited)
samevalue1|samevalue2| replacedvalue3|value4|value5|etc
I can't figure out how to accomplish this. This is my latest attempt(broken code):
import re
import csv
result = []
row = []
with open("C:\data\generatedfixed.csv","r") as data_file:
for line in data_file:
fields = line.split(',')
result.append(fields[2])
with open("C:\data\data.txt","r") as source_file, with open("C:\data\data_fixed.txt", "w") as fixed_file:
for line in source_file:
fields = line.split('|')
n=0
for value in result:
fields[2] = result[n]
n=n+1
row.append(line)
for value in row
fixed_file.write(row)
I would highly suggest you use the pandas package here, it makes handling tabular data very easy and it would help you a lot in this case. Once you have installed pandas import it with:
import pandas as pd
To read the files simply use:
data_file = pd.read_csv("C:\data\generatedfixed.csv")
source_file = pd.read_csv('C:\data\data.txt', delimiter = "|")
and after that manipulating these two files is easy, I'm not exactly sure how many values or which ones you want to replace, but if the length of both "iwantthisvalue3" and "iwanttoreplacethisvalue3" is the same then this should do the trick:
source_file['iwanttoreplacethisvalue3'] = data_file['iwantthisvalue3]
now all you need to do is save the dataframe (the table that we just updated) into a file, since you want to save it to a .txt file with "|" as the delimiter this is the line to do that (however you can customize how to save it in a lot of ways):
source_file.to_csv("C:\data\data_fixed.txt", sep='|', index=False)
Let me know if everything works and this helped you. I would also encourage to read up (or watch some videos) on pandas if you're planning to work with tabular data, it is an awesome library with great documentation and functionality.
So I have a several tables in the format of csv, I am using Python and the csv module. I want to extract a particular value, lets say column=80 row=109.
Here is a random example:
import csv
with open('hugetable.csv', 'r') as file:
reader = csv.reader(file)
print(reader[109][80])
I am doing this many times with large tables and I would like to avoid loading the whole table into an array (line 2 above) to ask for a single value. Is there a way to open the file, load the specific value and close it again? Would this process be more efficient than what I have done above?
Thanks for all the answers, all answers so far work pretty well.
You could try reading the file without csv library:
row = 108
column = 80
with open('hugetable.csv', 'r') as file:
header = next(file)
for _ in range(row-1):
_ = next(file)
line = next(file)
print(line.strip().split(',')[column])
You can try pandas to load only certain columns of your csv file
import pandas as pd
pd.read_csv('foo.csv',usecols=["column1", "column2"])
You could use pandas to load it
import pandas as pd
text = pd.read_csv('Book1.csv', sep=',', header=None, skiprows= 100, nrows=3)
print(text[50])
I have many txt files (which have been converted from pdf) in a folder. I want to create a csv/excel dataset where each text file will become a row. Right now I am opening the files in pandas dataframe and then trying to save it to a csv file. When I print the dataframe, I get one row per txt file. However, when saving to csv file, the texts get broken and create multiple rows/lines for each txt file rather than just one row. Do you know how I can solve this problem? Any help would be highly appreciated. Thank you.
Following is the code I am using now.
import glob
import os
import pandas as pd
file_list = glob.glob(os.path.join(os.getcwd(), "K:\\text_all", "*.txt"))
corpus = []
for file_path in file_list:
with open(file_path, encoding="latin-1") as f_input:
corpus.append(f_input.read())
df = pd.DataFrame({'col':corpus})
print (df)
df.to_csv('K:\\out.csv')
Update
If this solution is not possible it would be also helpful to transform the data a bit in pandas dataframe. I want to create a column with the name of txt files, that is, the name of each txt file in the folder will become the identifier of the respective text file. I will then save it to tsv format so that the lines do not get separated because of comma, as suggested by someone here.
I need something like following.
identifier col
txt1 example text in this file
txt2 second example text in this file
...
txtn final example text in this file
Use
import csv
df.to_csv('K:\\out.csv', quoting=csv.QUOTE_ALL)
I made csv file in my python code itself and going to append next data in ti it but the error is comming
io.UnsupportedOperation: not readable
I tried code is:
df.to_csv('timepass.csv', index=False)
with open(r'timepass.csv', 'a') as f:
writer = csv.reader(f)
your_list = list(writer)
print(your_list)
want to append next data and store in the same csv file. so that csv file having both previous and current data.
so please help me to find out..
Thanks in advance...
It is so simple just try this:
import pandas as pd
df = pd.read_excel("NSTT.xlsx","Sheet1") #reading Excel
print(df) #Printing data frame
df.to_excel("new.xlsx") #Writing Dataframe into New Excel file
Now here if you want to append data in the same file then use
df.to_excel("new.xlsx","a")
And no need to add in a list as you can directly access the data same as a list with data frame only you have to define the location .
Please check this.
You can use pandas in python to read csv and write csv:
import pandas as pd
df = pd.read_csv("csv file")
print(df)
Try:
with open(r'timepass.csv', 'r') as f:
reader = list(csv.reader(f))
print(reader)
Here you are opening your file as r, which means read-only and assigning the list contents to reader with list(csv.reader(f)). Your earlier code a opens the file for appending only where in the documentation is described as:
'a' opens the file for appending; any data written to the file is
automatically added to the end
and does not support the read().
And if you want to append data to the csv file from a different list, use the with open as a with the writer method.
with open('lake.csv','a') as f:
csv.writer(f,[1,2,3]) #dummy list [1,2,3]
Or directly from the pandas.DataFrame.to_csv method from your new dataframe, with header = False so as not to append headers:
df.to_csv('timepass.csv', index=False)
df_new.to_csv(r'timepass.csv', mode='a', header=False) #once you have updated your dataframe, you can directly append it to the same csv file
you can use pandas for appending two csv quickly
import pandas as pd
dataframe1=pd.read_csv("a.csv")
dataframe2=pd.read_csv("b.csv")
dataframe1=dataframe1.append(dataframe2)
dataframe1=dataframe1.reset_index(drop=True)
dataframe1.to_csv("a.csv")