Extract list to a csv file - python

I need to extract to a .csv file a Dataframe that I extract from a website. I can generate the values ​​but I can't extract to .csv because of the following error:
AttributeError: object 'list' has no attribute 'to_csv'
code:
import pandas as pd
url = "https://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-taxas-referenciais-bmf-ptBR.asp?Data=23/01/2023&Data1=20230123&slcTaxa=PRE"
df = pd.read_html(io=url, flavor='html5lib', encoding='latin1')
print(df)
df.to_csv(r'C:/Users/xport_dataframe.csv', index=False, header=True)

You have not made a data-frame, you just used pandas to create a list from html. Use df = pd.DataFrame(#list goes here) to create a dataframe, and then you can use df.to_csv(...

read_html returns a list of dataframes (as explained here).
You need to concatenate this list into a Pandas dataframe prior to exporting it to csv:
import pandas as pd
url = "https://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-taxas-referenciais-bmf-ptBR.asp?Data=23/01/2023&Data1=20230123&slcTaxa=PRE"
list_of_dfs = pd.read_html(io=url, flavor='html5lib', encoding='latin1')
print(df)
pd.concat(list_of_dfs).to_csv(r'C:/Users/xport_dataframe.csv', index=False, header=True)

Related

Unable to read a column of an excel by Column Name using Pandas

Excel Sheet
I want to read values of the column 'Site Name' but in this sheet, the location of this tab is not fixed.
I tried,
df = pd.read_excel('TestFile.xlsx', sheet_name='List of problematic Sites', usecols=['Site Name'])
but got value error,
ValueError: Usecols do not match columns, columns expected but not found: ['RBS Name']
The output should be, List of RBS=['TestSite1', 'TestSite2',........]
try reading the excel columns by this
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
df = pd.read_excel('File.xlsx', sheetname='Sheet1')
for i in df.index:
print(df['Site Name'][i])
You can first check dataframe without mentioning column name while reading excel file.
Then try to read column names.
Code is as below
import pandas as pd
df = pd.read_excel('TestFile.xlsx', sheet_name='List of problematic Sites')
print(df.head)
print(df.columns)

Writing row name using "index" while exporting files in csv format

I have a list and wish to export it in csv file.
I applied the following codes:
import pandas as pd
import csv
# Define List
Data = [101, 12, 143]
# Convert to dataframe
df_Data = pd.DataFrame(Data)
# Export to csv file
df_Data.to_csv("Data.csv", header=["Data"] , index=["Row1", "Row2", "Row3"])
I am able to rename the column name using "header" option.
However, the row-name doesn't change and is mentioned as such:
Can somebody please help me out with this in python?
I recommend you to not try to change index and column names in the to_csv(...) parameters.
So try using this code:
import pandas as pd
import csv
# Define List
Data = [101, 12, 143]
# Convert to dataframe
df_Data = pd.DataFrame(Data, columns=["Data"], index=["Row1", "Row2", "Row3"])
# Export to csv file
df_Data.to_csv("Data.csv")
Then the output CSV would work as expected.
You can set column and row names using a list like this.
df_Data.columns=['Data']
df_Data.index=['Row1','Row2','Row3']

Expected a list of dataframe got just one dataframe

Am trying to convert list of sheets from an excel file into a csv, so beginning with the following codes, i want to read the files first, but i only get the first sheet, and the rest are lost
import pandas as pd
def accept_xcl_file(file):
xcl_file = pd.ExcelFile(file)
sheets= xcl_file.sheet_names
file = xcl_file.parse(sheet_names = sheets)
return file,sheets
file, sheet = accept_xcl_file('Companies.xlsx')
sheet >>
this is the output from sheet
['companies',
'fruits',
'vehicles',
'sales',
'P&L',
'price',
'clubs',
'countries',
'housing',
'life-expectancy']
file['fruits'] >>
i get a keyerror when i try to index the file, but when i use 'companies' key i get the correct data. going by the documentation i should expect a DataFrame or dict of DataFrames
anyhelp..
The read_excel method is already available in pandas to import Excel data.
Try this instead of your code:
import pandas as pd
file = pd.read_excel('Companies.xlsx')
# file is a dict object
# keys are the sheet names as strings
# items are the pd.DataFrame objects containing sheet data

Is there a way to convert data frame styler object into dataframe in python

I have extracted xlsx data into pandas dataframe and used style.format to format particular columns into percentages and dollars. So now my dataframe is converted to styler object, because I need to parse this data into csv. I have to convert this object into dataframe please help.
below is the code and output:
import pandas as pd
import numpy as np
file_path = "./sample_data.xlsx"
df = pd.read_excel(file_path, sheet_name = "Channel",skiprows=10, header =
[0,1,2])
dollar_cols = ['SalesTY', 'SalesLY','InStoreTY', 'InStoreLY','eCommTY']
dollar_dict = {}
for dollar_col in dollar_cols:
formatdict[dollar_col] = "${:,.0f}"
final_df = df.style.format(formatdict)
Here final_df has the columns converted to dollars but I am unable to convert this to csv or into a data frame. It's a styler object now, I need to convert this into a data frame again. Any help is appreciated. Thanks.
You can retrieve the original dataframe from the styler object using the "data" attribute.
In your example:
df = final_df.data
type(df) yields
pandas.core.frame.DataFrame

How to import all fields from xls as strings into a Pandas dataframe?

I am trying to import a file from xlsx into a Python Pandas dataframe. I would like to prevent fields/columns being interpreted as integers and thus losing leading zeros or other desired heterogenous formatting.
So for an Excel sheet with 100 columns, I would do the following using a dict comprehension with range(99).
import pandas as pd
filename = 'C:\DemoFile.xlsx'
fields = {col: str for col in range(99)}
df = pd.read_excel(filename, sheetname=0, converters=fields)
These import files do have a varying number of columns all the time, and I am looking to handle this differently than changing the range manually all the time.
Does somebody have any further suggestions or alternatives for reading Excel files into a dataframe and treating all fields as strings by default?
Many thanks!
Try this:
xl = pd.ExcelFile(r'C:\DemoFile.xlsx')
ncols = xl.book.sheet_by_index(0).ncols
df = xl.parse(0, converters={i : str for i in range(ncols)})
UPDATE:
In [261]: type(xl)
Out[261]: pandas.io.excel.ExcelFile
In [262]: type(xl.book)
Out[262]: xlrd.book.Book
Use dtype=str when calling .read_excel()
import pandas as pd
filename = 'C:\DemoFile.xlsx'
df = pd.read_excel(filename, dtype=str)
the usual solution is:
read in one row of data just to get the column names and number of columns
create the dictionary automatically where each columns has a string type
re-read the full data using the dictionary created at step 2.

Categories

Resources