Python: convert excel data into dataframes

Python: convert excel data into dataframes - python

I want to put some data available in an excel file into a dataframe in Python.
The code I use is as below (two examples I use to read an excel file):
d=pd.ExcelFile(fileName).parse('CT_lot4_LDO_3Tbin1')
e=pandas.read_excel(fileName, sheetname='CT_lot4_LDO_3Tbin1',convert_float=True)
The problem is that the dataframe I get has the values with only two numbers after comma. In other words, excel values are like 0.123456 and I get into the dataframe values like 0.12.
A round up or something like that seems to be done, but I cannot find how to change it.
Can anyone help me?
thanks for the help !

You can try this. I used test.xlsx which has two sheets, and 'CT_lot4_LDO_3Tbin1' is the second sheet. I also set the first value as Text format in excel.
import pandas as pd
fileName = 'test.xlsx'
df = pd.read_excel(fileName,sheetname='CT_lot4_LDO_3Tbin1')
Result:
In [9]: df
Out[9]:
Test
0 0.123456
1 0.123456
2 0.132320
Without seeing the real raw data file, I think this is the best answer I can think of.

Well, when I try:
df = pd.read_csv(r'my file name')
I have something like that in df
http://imgur.com/a/Q2upp
And I cannot put .fileformat in the sentence

You might be interested in removing column datatype inference that pandas performs automatically. This is done by manually specifying the datatype for the column. Here is what you might be looking for.
Python pandas: how to specify data types when reading an Excel file?

Using pandas 0.20.1 something like this should work:
df = pd.read_csv('CT_lot4_LDO_3Tbin1.fileformat')
for exemple, in excel:
df = pd.read_csv('CT_lot4_LDO_3Tbin1.xlsx')
Read this documentation:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

Related

reshape data using python?

being new to python I am looking for some help reshaping this data, already know how to do so in excel but want a python specific solution.
I want it to be in this format.
entire dataset is 70k rows with different vc_firm_names, any help would be great.

If you care about performance, then I suggest you take a look at other methods (such as using numpy, or sorting the table):
https://stackoverflow.com/a/42550516/17323241
https://stackoverflow.com/a/66018377/17323241
https://stackoverflow.com/a/22221675/17323241 (look at second comment)
Otherwise, you can do:
# load data from csv file
df = pd.read_csv("example.csv")
# aggregate
df.groupby("vc_first_name")["investment_industry"].apply(list)

Assuming the original file is "original.csv", and you want to save it as "new.csv" I would do:
pd.read_csv("original.csv").groupby(by=["vc_firm_name"],as_index=False).aggregate(lambda x: ','.join(x)).to_csv("new.csv", index=False)

What code should I use in extracting specific column (with specific data) from a csv file to python. It can be either pandas or numpy

please see attached photo
here's the image
I only need to import a specific column with conditions(such as specific data found in that column). And also, I only need to remove unnecessary columns. dropping them takes too much code. What specific code or syntax is applicable?

How to get a column from pandas dataframe is answered in Read specific columns from a csv file with csv module?
To quote:
Pandas is spectacular for dealing with csv files, and the following
code would be all you need to read a csv and save an entire column
into a variable:
import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']
So in your case, you just save the the filtered data frame in a new variable.
This means you do newdf = data.loc[...... and then use the code snippet from above to extract the column you desire, for example newdf.continent

How to remove index column when reading with read_csv in pandas

We are trying to read a sample simple csv file using pandas in python as follows -
df = pd.read_csv('example.csv')
print(df)
We need df by removing below red highlighted index column -
We have tried multiple ways by passing parameters but no luck.
Please help me in this issue!!

A dataframe requires having some kind of index as part of the structure.
If you want to simply print the output without the index you can use the approach suggested here, with Python 3 syntax:
print(df.to_string(index=False))
but it will not have the nice dataframe rendering in Jupyter as you have in your example.
If you want to avoid pandas outputting the index when writing to a CSV file you can use the option index=False, for example:
df.to_csv('example.csv', index=False)
This will avoid creating the index column in the saved CSV file.

add index_col=False
pd.read_csv('path.csv',index_col=False)
or remove index from dataframe
df.reset_index(drop=True, inplace=True)

How do I convert these 2 columns as seen below in In [10] to a dataframe/table to be able to export to a csv file

enter image description here
Hi, I am very new to Python and I plan to create a final exportable table with these reviews scraped from a website to see the words that were most used. I have thus managed to get this 2 columns but have no idea how to proceed, can I directly export this into a table in excel or must I convert it into a dataframe then export it to a CSV? And what is the required code to run as such? Thank you so much for your help!!

It's convenient to use pandas library for working with dataframes:
import pandas as pd
series = pd.Series(wordcount)
series.to_csv("wordcount.csv")
However, if you use the code above, you'll get a warning. To fix it, there are 2 ways:
1) Add header parameter:
series.to_csv("wordcount.csv", header=True)
2) Or convert series to dataframe and then save it (without new index):
df = series.reset_index()
df.to_csv("wordcount.csv", index=False)

Converting CSV to HTML keeping format

My objective is: Converting DF to HTML which is send as an everyday mail
Current Method : converting df to csv to html
Problem: I have created my df which has as_index=True set, but when I save it to a csv this formatting is lost :
Example DataFrame:
Now when I save this df using to_csv(), the formatting in the index is lost ( means that ABC is now written 3 times across the index, instead of once as I want it)
I want the CSV to have the same formatting is that possible?

Please install pandas and use to_html().
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_html.html
Hope it can help you.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: convert excel data into dataframes - python

Well, when I try: df = pd.read_csv(r'my file name') I have something like that in df http://imgur.com/a/Q2upp And I cannot put .fileformat in the sentence

You might be interested in removing column datatype inference that pandas performs automatically. This is done by manually specifying the datatype for the column. Here is what you might be looking for. Python pandas: how to specify data types when reading an Excel file?

Using pandas 0.20.1 something like this should work: df = pd.read_csv('CT_lot4_LDO_3Tbin1.fileformat') for exemple, in excel: df = pd.read_csv('CT_lot4_LDO_3Tbin1.xlsx') Read this documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

Related

reshape data using python?

What code should I use in extracting specific column (with specific data) from a csv file to python. It can be either pandas or numpy

How to remove index column when reading with read_csv in pandas

How do I convert these 2 columns as seen below in In [10] to a dataframe/table to be able to export to a csv file

Converting CSV to HTML keeping format

Categories

Resources