What do I have to change that Jupyter shows columns? - python

I just want to import this csv file. It can read it but somehow it doesn't create columns. Does anyone know why?
This is my code:
import pandas as pd
songs_data = pd.read_csv('../datasets/spotify-top50.csv', encoding='latin-1')
songs_data.head(n=10)
Result that I see in Jupyter:
P.S.: I'm kinda new to Jupyter and programming, but after all I found it should work properly. I don't know why it doesn't do it.

To properly load a csv file you should specify some parameters. for example in you case you need to specify quotechar:
df = pd.read_csv('../datasets/spotify-top50.csv',quotechar='"',sep=',', encoding='latin-1')
df.head(10)
If you still have a problem you should have a look at your CSV file again and also pandas documentation, so that you can set parameters to match with your CSV file structure.

Related

how to convert the CSV to parquet after renaming column name?

I am using the below code to rename one of the column name in a CSV file.
input_filepath ='/<path_to_file>/2500.csv'
df_csv = spark.read.option('header', True).csv(input_filepath)
df1 = df_csv.withColumnRenamed("xyz", "abc")
df1.printSchema()
So, the above code works fine. however, I wanted to also convert the CSV to parquet format. If I am correct, the above code will make the changes and put in the memory and not to the actual file. Please correct me if I am wrong.
If the changes are kept in memory, then how can I put the changes to parquet file?
For the file format conversion, I will be using below code snippet
df_csv.write.mode('overwrite').parquet()
But not able to figure out how to use it in this case. Please suggest
Note: I am suing Databricks notebook for all the above steps.
Hi I am not 100% sure if that will solve your issue but can you try to save your csv like this:
df.to_parquet(PATH_WHERE_TO_STORE)
Let me know if that helped.
Edit: My workflow usually goes like this
Export dataframe as csv
Check visually if everything is correct
Run this function:
import pandas as pd
def convert_csv_to_parquet(path_to_csv):
df = pd.read_csv(path_to_csv)
df.to_parquet(path_to_csv.replace(".csv", ".parq"), index=False)
Which goes into the direction of Rakesh Govindulas' comment.

How to read CSV file using Pandas (Jupyter notebooks)

(Very new coder, first time here, apologies if there are errors in writing)
I have a csv file I made from Excel called SouthKoreaRoads.csv and I'm supposed to read that csv file using Pandas. Below is what I used:
import pandas as pd
import os
SouthKoreaRoads = pd.read_csv("SouthKoreaRoads.csv")
I get a FileNotFoundError, and I'm really new and unsure how to approach this. Could anyone help, give advice, or anything? Many thanks in advance
just some explanation aside. Before you can use pd.read_csv to import your data, you need to locate your data in your filesystem.
Asuming you use a jupyter notebook or pyton file and the csv-file is in the same directory you are currently working in, you just can use:
import pandas as pd SouthKoreaRoads_df = pd.read_csv('SouthKoreaRoads.csv')
If the file is located in another directy, you need to specify this directory. For example if the csv is in a subdirectry (in respect to the python / jupyter you are working on) you need to add the directories name. If its in folder "data" then add data in front of the file seperated with a "/"
import pandas as pd SouthKoreaRoads_df = pd.read_csv('data/SouthKoreaRoads.csv')
Pandas accepts every valid string path and URLs, thereby you could also give a full path.
import pandas as pd SouthKoreaRoads_df = pd.read_csv('C:\Users\Ron\Desktop\Clients.csv')
so until now no OS-package needed. Pandas read_csv can also pass OS-Path-like-Objects but the use of OS is only needed if you want specify a path in a variable before accessing it or if you do complex path handling, maybe because the code you are working on needs to run in a nother environment like a webapp where the path is relative and could change if deployed differently.
please see also:
https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
https://docs.python.org/3/library/os.path.html
BR
SouthKoreaRoads = pd.read_csv("./SouthKoreaRoads.csv")
Try this and see whether it could help!
Try to put the full path, like "C:/users/....".

Issue with saving CSV file how can I fix this in python pandas?

I'm having trouble dropping columns and saving the new data frame as a CSV file.
Code:
import pandas as pd
file_path = 'Downloads/editor_events.csv'
df = pd.read_csv(file_path, index_col = False, nrows= 1000)
df.to_csv(file_path, index = False)
df.to_csv(file_path)
The code executes and doesn't give any error. I've looked in my root directory but can't see any new csv file
Check file in folder in which you are running python script. And you are saving with same name, so you can check modified time to confirm it. Also you are not dropping columns as per posted code, you are just taking 1000 rows and saving it.
First: you are saving the same file that you are reading, so you won't see any new csv files. All you are doing right now is rewriting the same file.
But since I can guess you just show it as simple example of what you want to do, I will move to second:
Make sure that your path is correct. Try to write the full path, like 'c:\Users\AwesomeUser\Downloads\editor_events.csv' instead of just 'Downloads\editor_events.csv'.

How to convert a csv file to a dataframe in Python 3.6 [duplicate]

This question already exists:
Reading CSV files in Python, using Jupyter Notebook through IntelliJ IDEA
Closed 4 years ago.
Im trying to tackle the Kaggle Titanic challenge. Bear with me, as Im fairly new to data science. I was previously struggling to get the following syntax to work: my previous question(Reading CSV files in Python 3.6, using IntelliJ IDEA)
Reading CSV files in Python, using Jupyter Notebook through IntelliJ IDEA
import numpy as np
import pandas as pd
from pandas import Series,Dataframe
titanic_df = pd.read_csv('train.csv')
titanic.head()
However, using the below code, I am able to open the file and read it/print its contents, but i need to convert the data to a dataframe so that it can be worked with. Any suggestions?
file_path = '/Volumes/LACIE SETUP/Data_Science/Data_Analysis_Viz_InPython/Example_Projects/train.csv'
with open(file_path) as train_fp:
for line in train_fp:
# print(line)
This above code was able to print out the data but when I tried passing
'file_path' to:
titanic_df = pd.read_csv('file_path.csv')
i received the same error as before. Not sure what Im doing wrong. I KNOW the file 'train.csv' exists in that location because 1) i put it there and 2) its contents can be printed when pointed to its location.
So what the heck am I doing wrong??? :/
read_csv will create a Pandas DataFrame. So, as long as your file path is right, this following code should work. Also, make sure to use the file_path variable and not the string "file_path.csv"
import pandas as pd
file_path = '/Volumes/LACIE SETUP/Data_Science/Data_Analysis_Viz_InPython/Example_Projects/train.csv'
titanic_df = pd.read_csv(file_path)
titanic_df.head()

Editing a csv file with python

Ok so I'm looking to create a program that will interact with an excel spreadsheet. The idea that seemed to work the most is converting it to a csv file. I've managed to make a program that prints the data but I want it to edit it and thus change the results in the csv file itself.
Sorry if it's a bit confusing as my programming skills aren't great.
Heres the code:
import csv
with open('wert.csv') as csvfile:
freq=csv.reader(csvfile, delimiter=',')
for row in freq:
print(row[0],row[1],row[2])
If anyone has a better idea on how to make this program work then it would be greatly appreciated.
Thanks
You could try using the pandas package, a widely used data analysis/manipulation library.
import pandas as pd
data = pd.read_csv('foo.csv')
#change data here, see pandas documentation
data.to_csv('bar.csv')
You can find the docs here
If you csv file is composed of just numbers (floats) or numbers and a header, you can try reading it with:
import numpy as np
data=np.genfromtxt('name.csv',delimiter=',',skip_header=1)
Then modify your data in python, and save it with:
data_modified=data**2 #for example
np.savetxt('name_modified.csv',data_modified,delimiter=',',header='whaterverheader,you,want')
You can read the excel file directly using pandas and do the processing directly
import pandas
measured_data = pandas.read_excel(filename)

Categories

Resources