Issue with pandas dataframe - python

When I tried to read data from a csv file, I got the result that is in a weird format.
import pandas as pd
df = pd.read_csv('HDMA Boston Housing Data.csv')
df

The file seems to be .txt file, but renamed and read like a .csv file.

Related

downloading CSV file in python using pandas

I am trying to download a csv file to python. For some reason I can not do it. I suppose I need to add an additional argument to read_csv?
import pandas as pd
url = "https://raw.githubusercontent.com/UofGAnalyticsData/"\
"DPIP/main/assesment_datasets/assessment3/starwars.csv"
df = pd.read_csv(url)
The code you attempt is downloading the content from the url and pasting it in the data frame named 'df'.
You need to save the output csv by using the following line. You will find the output file in the same directory where the python script is saved.
import pandas as pd
url = "https://raw.githubusercontent.com/UofGAnalyticsData/"\
"DPIP/main/assesment_datasets/assessment3/starwars.csv"
df = pd.read_csv(url)
df.to_csv('output.csv')

Converte json file to csv file with proper formatted rows and columns in excel

Currently I'm working a script that can convert json file to csv format my script is working but I need to modify it to have proper data format like having rows and columns when the json file is converted to csv file, May I know what I need to add or modify on my script?
import pandas as pd
df = pd.read_json (r'/home/admin/myfile.json')
df.to_csv (r'/home/admin/xml/myfileSample.csv', index = None, sep=":")
Taking reference from your code,you can try
df.to_csv(r'/home/admin/xml/myfileSample.csv', encoding='utf-8', header=header,index = None, sep=":")
This could be useful.
import pandas as pd
df_json=pd.read_json("input_file.json")
df_json.head()
df_json.to_csv("output_file.csv",index=False)
Your code is all fine, Just change the to_csv to to_excel function and it should work all fine!
import pandas as pd
df = pd.read_json (r'/home/admin/myfile.json')
df.to_excel (r'/home/admin/xml/myfileSample.csv', index = None, sep=":")
Learn more about the to_excel function of pandas here:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html

Reading in a stata .dta file as a python pandas data frame using pd.read_stata()

I want to read in an .dta file as a pandas data frame.
I've tried using code from https://www.fragilefamilieschallenge.org/using-dta-files-in-python/ but it gives me an error.
Thanks for any help!
import pandas as pd
df_path = "https://zenodo.org/record/3635384/files/B-PROACT1V%20Year%204%20%26%206%20child%20BP%2C%20BMI%20and%20PA%20dataset.dta?download=1"
df = None
with open(df_path, "r") as f:
df = pd.read_stata(f)
print df.head()
open can be used when you have a file saved locally on your machine. With pd.read_stata this is not necessary however, as you can specify the file path directly as a parameter.
In this case you want to read in a .dta file from a url so this does not apply. The solution is simple though, as pd.read_stata can read in files from urls directly.
import pandas as pd
url = 'https://zenodo.org/record/3635384/files/B-PROACT1V%20Year%204%20%26%206%20child%20BP%2C%20BMI%20and%20PA%20dataset.dta?download=1'
df = pd.read_stata(url)

How can i reprocess it in order to get a well laid out csv file as in the pictures attached?

I have this winequality-red csv file separated with semi columns and I would like to convert it into a proper.csv file in order to easily extract the attributes.A csv data separated with semi columns and it's so muddled up enter image description hereoriginal csv dataenter image description here desired csv data
You can just import it with pandas and re-export it.
import pandas as pd
df = pd.read_csv('your_file.csv', sep=';')
df.to_csv('your_file.csv')

How to write CSV files into XLSX using Python Pandas?

I have several .csv files and I want to write them into one .xlsx file as spreadsheets.
I've loaded these .csv files into Pandas.DataFrame using following code:
df1 = pandas.read_csv('my_file1.csv')
df2 = pandas.read_csv('my_file2.csv')
......
df5 = pandas.read_csv('my_file5.csv')
But I couldn't find any functions in Pandas that can write these DataFrames into one .xlsx file as separated spreadsheets.
Can anyone help me with this?
With recent enough pandas use DataFrame.to_excel() with an existing ExcelWriter object and pass sheet names:
from pandas.io.excel import ExcelWriter
import pandas
csv_files = ['my_file1.csv', 'my_file2.csv', ..., 'my_file5.csv']
with ExcelWriter('my_excel.xlsx') as ew:
for csv_file in csv_files:
pandas.read_csv(csv_file).to_excel(ew, sheet_name=csv_file)

Categories

Resources