Pandas html to df - commas in numbers

Pandas html to df - commas in numbers - python

I'm newbie in Python. I need to download some tables from Polish language webpages.
I have problem with commas in numbers because it seems that Pandas delete them?
For example:
import pandas as pd
x = pd.read_html('https://www.gpw.pl/wskazniki', encoding='utf-8', decimal=",")[1]
The result in C/WK column is "021" instead "0,21".
How to download it properly or change to "0.21".
Thank you

The issue is with the thousands separator, which also defaults to common.
To read the data and parse it correctly, use:
pd.read_html('https://www.gpw.pl/wskazniki',encoding = 'utf-8', decimal=',', thousands='.')[1]
The result is:

Related

Problems with Pandas reading txt file

I am having a rough time getting my code (python 3) to read a txt file. I am using Pandas to get it to work and I have it read the file and gets the right number of rows, but the module reads the file as one column and makes the entire dataframe into one column 0. Here is an example of the code.
import pandas as pd
import numpy as np
data = pd.read_csv(r'file.txt',header=None)
I have used the delimiters/seperaters setup too in the line of code like \t or ' ' but it couldn't read the file then.
Here is an example of what the file looks like.
JK+0923 7.05 19.3 200.4 -56.1 0.140 0.022 2010 GHT-Jermi
As you can see, there is no header.
Either way, would like help. Thanks.
I want it to read the columns correctly.

import pandas as pd
import numpy as np
data = pd.read_csv(r'asd.txt',header=None,sep='\t')
This should work if thedelimiter in your case is tab
or you can use a regex like \s+ for the value of sep for accepting multiple spaces as delimiter

The pd.read_csv() function expects a header when used in the standard way. However, you can specify the header=None parameter, see this question for more details:
Pandas read in table without headers
As you pointed out in your question, you have already tried to specify the delimiter when reading in the file, so the combination of both should help you read the file in correctly:
data = pd.read_csv(r'file.txt',header=None, sep='\t')

Converting CSV to HTML keeping format

My objective is: Converting DF to HTML which is send as an everyday mail
Current Method : converting df to csv to html
Problem: I have created my df which has as_index=True set, but when I save it to a csv this formatting is lost :
Example DataFrame:
Now when I save this df using to_csv(), the formatting in the index is lost ( means that ABC is now written 3 times across the index, instead of once as I want it)
I want the CSV to have the same formatting is that possible?

Please install pandas and use to_html().
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_html.html
Hope it can help you.

Saving DataFrame to csv but output cells type becomes number instead of text

import pandas as pd
check = pd.read_csv('1.csv')
nocheck = check['CUSIP'].str[:-1]
nocheck = nocheck.to_frame()
nocheck['CUSIP'] = nocheck['CUSIP'].astype(str)
nocheck.to_csv('NoCheck.csv')
This works but while writing the csv, a value for an identifier like 0003418 (type = str) converts to 3418 (type = general) when the csv file is opened in Excel. How do I avoid this?

I couldn't find a dupe for this question, so I'll post my comment as a solution.
This is an Excel issue, not a python error. Excel autoformats numeric columns to remove leading 0's. You can "fix" this by forcing pandas to quote when writing:
import csv
# insert pandas code from question here
# use csv.QUOTE_ALL when writing CSV.
nocheck.to_csv('NoCheck.csv', quoting=csv.QUOTE_ALL)
Note that this will actually put quotes around each value in your CSV. It will render the way you want in Excel, but you may run into issues if you try to read the file some other way.
Another solution is to write the CSV without quoting, and change the cell format in Excel to "General" instead of "Numeric".

Python: convert excel data into dataframes

I want to put some data available in an excel file into a dataframe in Python.
The code I use is as below (two examples I use to read an excel file):
d=pd.ExcelFile(fileName).parse('CT_lot4_LDO_3Tbin1')
e=pandas.read_excel(fileName, sheetname='CT_lot4_LDO_3Tbin1',convert_float=True)
The problem is that the dataframe I get has the values with only two numbers after comma. In other words, excel values are like 0.123456 and I get into the dataframe values like 0.12.
A round up or something like that seems to be done, but I cannot find how to change it.
Can anyone help me?
thanks for the help !

You can try this. I used test.xlsx which has two sheets, and 'CT_lot4_LDO_3Tbin1' is the second sheet. I also set the first value as Text format in excel.
import pandas as pd
fileName = 'test.xlsx'
df = pd.read_excel(fileName,sheetname='CT_lot4_LDO_3Tbin1')
Result:
In [9]: df
Out[9]:
Test
0 0.123456
1 0.123456
2 0.132320
Without seeing the real raw data file, I think this is the best answer I can think of.

Well, when I try:
df = pd.read_csv(r'my file name')
I have something like that in df
http://imgur.com/a/Q2upp
And I cannot put .fileformat in the sentence

You might be interested in removing column datatype inference that pandas performs automatically. This is done by manually specifying the datatype for the column. Here is what you might be looking for.
Python pandas: how to specify data types when reading an Excel file?

Using pandas 0.20.1 something like this should work:
df = pd.read_csv('CT_lot4_LDO_3Tbin1.fileformat')
for exemple, in excel:
df = pd.read_csv('CT_lot4_LDO_3Tbin1.xlsx')
Read this documentation:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

Python: Using Pandas library. How to keep quotes on text?

I'm using the following code of Python using the Pandas library. The purpose of the code is to join 2 CSV files and works as exptected. In the CSV files all the values are within "". When using the Pandas libray they dissapear. I wonder what I can do to keep them? I have read the documentation and tried lots of options but can't seem to get it right.
Any help is much appreciated.
Code:
import pandas
csv1 = pandas.read_csv('WS-Produktlista-2015-01-25.csv', quotechar='"',comment='"')
csv2 = pandas.read_csv('WS-Prislista-2015-01-25.csv', quotechar='"', comment='"')
merged = csv1.merge(csv2, on='id')
merged.to_csv("output.csv", index=False)
Instead of getting a line like this:
"1","Cologne","4711","4711","100ml",
I'm getting:
1,Cologne,4711,4711,100ml,
EDIT:
I now found the problem. My files contains a header with 16 columns. The data lines contains 16 values separated with ",".
Just found that some lines contains values within "" that contains ",". This is confusing the parser. Instead of expecting 15 commas, it finds 18. One example below:
"23210","Cosmetic","Lancome","Eyes Virtuose Palette Makeup",**"7,2g"**,"W","Decorative range","5x**1,2**g Eye Shadow + **1,2**g Powder","http://image.jpg","","3660732000104","","No","","1","1"
How can make the parser ignore the comma sign within ""?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas html to df - commas in numbers - python

The issue is with the thousands separator, which also defaults to common. To read the data and parse it correctly, use: pd.read_html('https://www.gpw.pl/wskazniki',encoding = 'utf-8', decimal=',', thousands='.')[1] The result is:

Related

Problems with Pandas reading txt file

Converting CSV to HTML keeping format

Saving DataFrame to csv but output cells type becomes number instead of text

Python: convert excel data into dataframes

Python: Using Pandas library. How to keep quotes on text?

Categories

Resources