Python - create CSV file - python

I am using the code below to create a file using Python. I don't get any error message when I run it but at the same time no file gets created
df_csv = pd.read_csv (r'X:\Google Drive\Personal_encrypted\Training\Ex_Files_Python_Excel\Exercise Files\names.csv', header=None)
df_csv.to_csv = (r"C:\temp\modified_names.csv")

You are setting df_csv.to_csv to a tuple, which is not how you call methods in python.
Solution:
df_csv.to_csv(r"C:\temp\modified_names.csv")
DataFrame.to_csv documentation here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html
Edit: I also noticed the title says "Create Excel File"
To do that you would do the following:
df_csv.to_excel(r"C:\temp\modified_names.xlsx")
Documentation: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html

I usually make the .csv file like this:
import csv
with open(FILENAME, 'w') as file:
csv_write = csv.writer(file,delimiter='\t')
csv_write.writerow(LINE)
LINE : is an array of row you want to write

Related

How to write text files with sys.args[] as part of the filename?

I am pretty new to Python and I am trying to filter some rows in a dataframe based on whether they contain strings or not. I want the script to automatically use the input name to save the filtered dataframe on a text file.
Suppose I read my file with python3 code.py input.txt and my code looks like this:
#!/usr/bin/python3
import pandas as pd
import sys
data = pd.read_csv(sys.argv[1], sep='\t', header=0)
selectedcols = data['Func.refGene']
selectedrows = selectedcols.str.contains("exonic|splicing")
selecteddata = data[selectedrows]
selecteddata.to_csv(f'{sys.argv[1][:-4]}_exonic.splicing.txt', index=None, sep='\t', mode = 'a')
Where 'Func.refGene' is the column I want to search through for the strings "exonic" and "splicing". I have written this code and it worked before, but now I try to run it and the following error occurs:
File "code.py", line 12
selecteddata.to_csv(f'{sys.argv[1][:-4]}_exonic.splicing.txt', index=None, sep='\t', mode = 'a')
^
SyntaxError: invalid syntax
Would anyone know what could be wrong? I have searched for this syntax and haven't had any success.
Try this for below python 3.6,
selecteddata.to_csv('{0}_exonic.splicing.txt'.format(sys.argv[1][:-4]), index=None, sep='\t', mode = 'a')
f-string supports from python 3.6 https://docs.python.org/3/whatsnew/3.6.html#pep-498-formatted-string-literals

pandas dataframe to excel

I am trying to save to an excel file from a panda dataframe. After some methods of scraping the data I end up having the final method, where I generate the data to an excel file.
The problem is that I want the sheet_name to be an input variable for each scrape I do.
But with the code below, I got the error:
ValueError: No engine for filetype: ''
def datacollection(self,filename):
tbl= self.find_element_by_xpath("/html/body/form/div[3]/div[2]/div[3]/div[3]/div[1]/table").get_attribute('outerHTML')
df=pd.read_html(tbl)
print(df[0])
print(type(df[0]))
final=pd.DataFrame(df[0])
final.to_excel(r'C:\Users\ADMIN\Desktop\PROJECTS\Python',sheet_name=f'{filename}')
I believe the problem here is that you are asking it to write to a file called Python, without any file extension.
You could name it Python.xlsx for example.
Or, if Python was the directory name, then it should be Python/somefilename.xlsx
EDIT: Given that you were trying to name the file after filename, you are using the sheet_name parameter wrong, which names the sheet instead of the file. Ditch the sheet_name and change the last line to:
final.to_excel(fr'C:\Users\ADMIN\Desktop\PROJECTS\Python\{filename}.xlsx')
You need to give a file extension for the excel file:
final.to_excel(r'C:\Users\ADMIN\Desktop\PROJECTS\Python.xlsx',sheet_name=f'{filename}')
SOLUTION:
If using f' the path access must be changed from \ to / as:
def datacollection(self,filename):
tbl= self.find_element_by_xpath("/html/body/form/div[3]/div[2]/div[3]/div[3]/div[1]/table").get_attribute('outerHTML')
df=pd.read_html(tbl)
print(df[0])
print(type(df[0]))
final=pd.DataFrame(df[0])
final.to_excel(f'C:/Users/ADMIN/Desktop/PROJECTS/Python/{filename}.xlsx')
This might solve the error !!
final.to_excel(f'C:\Users\ADMIN\Desktop\PROJECTS\Python\{filename}.xlsx')

read csv file using string from another df(pandas, python, dataframe)

is it possible to read csv file using string from another df?
normally, to read a csv file, i'd use the code as follow:
df = pd.read_csv("C:/Users/Desktop/file_name.csv")
however, i'd like to automate reading a csv file using string from another df:
df1_string = df1.iloc[0]['file_name']
df2 = pd.read_csv("C:/Users/Desktop/df1_string.csv")
i got a FileNotFoundError when i tried the above code:
FileNotFoundError: [Errno 2] File b'C:/Users/Desktop/df1_string,csv' does not exist
kindly advices, many thanks
Use python string formatting:
df1_string = df1.iloc[0]['file_name']
df2 = pd.read_csv(f"C:/Users/Desktop/{df1_string }.csv")

Python: How to create a new dataframe with first row when a specific value

I am reading csv files into python using:
df = pd.read_csv(r"C:\csvfile.csv")
But the file has some summary data, and the raw data start if a value "valx" is found. If "valx" is not found then the file is useless. I would like to create news dataframes that start when "valx" is found. I have been trying for a while with no success. Any help on how to achieve this is greatly appreciated.
Unfortunately, pandas only accepts skiprows for rows to skip in the beginning. You might want to parse the file before creating the dataframe.
As an example:
import csv
with open(r"C:\csvfile.csv","r") as f:
lines = csv.reader(f, newline = '')
if any('valx' in i for i in lines):
data = lines
Using the Standard Libary csv module, you can read file and check if valx is in the file, if it is found, the content will be returned in the data variable.
From there you can use the data variable to create your dataframe.

CParserError: Error tokenizing data

I'm having some trouble reading a csv file
import pandas as pd
df = pd.read_csv('Data_Matches_tekha.csv', skiprows=2)
I get
pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 526, saw 5
and when I add sep=None to df I get another error
Error: line contains NULL byte
I tried adding unicode='utf-8', I even tried CSV reader and nothing works with this file
the csv file is totally fine, I checked it and i see nothing wrong with it
Here are the errors I get:
In your actual code, the line is:
>>> pandas.read_csv("Data_Matches_tekha.xlsx", sep=None)
You are trying to read an Excel file, and not a plain text CSV which is why things are not working.
Excel files (xlsx) are in a special binary format which cannot be read as simple text files (like CSV files).
You need to either convert the Excel file to a CSV file (note - if you have multiple sheets, each sheet should be converted to its own csv file), and then read those.
You can use read_excel or you can use a library like xlrd which is designed to read the binary format of Excel files; see Reading/parsing Excel (xls) files with Python for for more information on that.
Use read_excel instead read_csv if Excel file:
import pandas as pd
df = pd.read_excel("Data_Matches_tekha.xlsx")
I have encountered the same error when I used to_csv to write some data and then read it in another script. I found an easy solution without passing by pandas' read function, it's a package named Pickle.
You can download it by typing in your terminal
pip install pickle
Then you can use for writing your data (first) the code below
import pickle
with open(path, 'wb') as output:
pickle.dump(variable_to_save, output)
And finally import your data in another script using
import pickle
with open(path, 'rb') as input:
data = pickle.load(input)
Note that if you want to use, when reading your saved data, a different python version than the one in which you saved your data, you can precise that in the writing step by using protocol=x with x corresponding to the version (2 or 3) aiming to use for reading.
I hope this can be of any use.

Categories

Resources