Reading csv file with no column names python - python

Hi I wanted to ask that i got a data of 80 columns and all of them have no names and its 80 columns and 12500 rows long and there are blank columns between them
8,1,,0,1993,146,,2,1,,,,,,,,,,2.1,0.65,0.15,0.65,19.1,,18.03,,,19.6,,,0.06,,,,,,19.1,19.6294717,19.36473585,0.06,,,,51.25,19.3,23.3,-0.04,-0.04,0.34,0.07,0.16,,,0.16,,,,,,,,,,,,,,,,,,15.3,7.8,11.55,58,100,79,15.4,8,11.7
I want to use this csv file and i want to delete the extra blank columns which i might do from using the code line
data = data.dropna()
bur it only can delete the rows i think further more how can I access or name the particular column
data = pd.read_csv('CollectedData.csv')

IIUC, use this pandas.read_csv with header=None and pandas.DataFrame.dropna on axis=1:
data = pd.read_csv("CollectedData.csv", header=None).dropna(axis=1, how="all")
If you need to give names to each column, use names parameter:
cols_names= ["ColA", "ColB", "ColC", ..]
data = (
pd.read_csv("CollectedData.csv",
header=None,
names=cols_names)
.dropna(axis=1, how="all")
)

Related

separate data from 1 column to 2 column in python

I have a dataset consisting of 1 column. In this column there are 2 data. How to separate the data into a second column. i want my dataset to be 2 columns. Thank you
here is my dataset :
i wanna make like this :
If your file is tab delimited, just use
import pandas as pd
df = pd.from_csv(file_path,sep='\t')
Try this:
Directly read the file using read_csv and define these parameters to specify '\t' as delimiter and to ignore error lines set error_bad_lines=False.
df = pd.read_csv('SINGGALANG.tsv',sep='\t',header=None,names=["Tokens","Tags"],error_bad_lines=False, engine="python")
Alternatively, If you want to read all the data in 1 column and then to split it in different columns you can do following:
Assuming, you have dataframe like this:
df = pd.DataFrame(['val1 0','val2 0','val3 0'])
You can use split and create columns.
df[["col1","col2"]] = df[0].str.split(expand=True)

Reset labels in Pandas DataFrame, Python

I have a csv file with a wrong first row data. The names of labels are in the row number 2. So when I am storing this file to the DataFrame the names of labels are incorrect. And correct names become values of the row 0. Is there any function similar to reset_index() but for columns? PS I can not change csv file. Here is an image for better understanding. DataFrame with wrong labels
Hello let's suppose you csv file is data.csv :
Try this code:
import pandas as pd
#reading the csv file
df = pd.read_csv('data.csv')
#changing the headers name to integers
df.columns = range(df.shape[1])
#saving the data in another csv file
df.to_csv('data_without_header.csv',header=None,index=False)
#reading the new csv file
new_df = pd.read_csv('data_without_header.csv')
#plotting the new data
new_df.head()
If you do not care about the rows preceding your column names, you can pass in the "header" argument with the value of the correct row, for example if the proper column names are in row 2:
df = pd.read_csv('my_csv.csv', header=2)
Keep in mind that this will erase the previous rows from the DataFrame. If you still want to keep them, you can do the following thing:
df = pd.read_csv('my_csv.csv')
df.columns = df.iloc[2, :] # replace columns with values in row 2
Cheers.

Pandas data frame not allowing me to drop first empty column in python?

I have read in some data from a csv, and there were a load of spare columns and rows that were not needed. I've managed to get rid of most of them, but the first column is showing as an NaN and will not drop despite several attempts. This means I cannot promote the titles in row 0 to headers. I have tried the below:
df = pd.read_csv(("List of schools.csv"))
df = df.iloc[3:]
df.dropna(how='all', axis=1, inplace =True)
df.head()
But I am still getting this returned:
Any help please? I'm a newbie
You can improve your read_csv() operation.
Avloss can tell your "columns" are indices because they are bold. Looking at your output, there are two things of note.
The "columns" are bold implying that pandas read them in as part of the index of the DataFrame rather than as values
There is no information above the horizontal line at the top indicating there are currently no column names. The top row of the csv file that contains the column names is being read in as values.
To solve your column deletion problem, you should first improve your read_csv() operation by being more explicit. Your current code is placing column headers in the data and placing some of the data in the indicies. Since you have the operation df = df.iloc[3:] in your code, I'm assuming the data in your csv file doesn't start until the 4th row. Try this:
header_row = 3 #or 4 - I am bad at zero-indexing
df = pd.read_csv('List of schools.csv', header=header_row, index_col=False)
df.dropna(how='all', axis=1, inplace =True)
This code should read the column names in as column names and not index any of the columns, giving you a cleaner DataFrame to work from when dropping NA values.
those aren't columns, those are indices. You can convert them to columns by doing
df = df.reset_index()

How to append a column from one csv file to second csv (with different indices)

I am working on concatenating many csv files together and want to take one column, from a multicolumn csv, and append it as a new column in a second csv. The problem is that the columns have different numbers of rows so the new column that I am adding to the existing csv gets cut short once the row index from the existing csv is reached.
I have tried to read in the new column as a second dataframe and then add that dataframe as a new column to the existing csv.
df = pd.read_csv("Existing CSV.csv")
df2 = pd.read_csv("New CSV.csv", usecols = ['Desired Column'])
df["New CSV"] = df2
"Existing CSV" has 1200 rows of data while "New CSV" has 1500 rows. When I run the code, the 'New CSV" column is added to "Existing CSV", however, only the first 1200 rows of data are included.
Ideally, all 1500 rows from "New CSV" will be included and the 300 rows missing from "Existing CSV" will be left blank.
By default, read_csv gives the resulting DataFrame an integer index, so I can think of a couple of options to try.
Setup
df = pd.read_csv("Existing CSV.csv")
df2 = pd.read_csv("New CSV.csv", usecols = ['Desired Column'])
Method 1: join
df = df.join(df2['Desired Column'], how='right')
Method 2: reindex_like and assign
df = df.reindex_like(df2).assign(**{'Desired Column': df2['Desired Column']})

How to delete or drop the column labelled "index" from a dataframe when using to_csv() to save as csv

I am reading a csv file, cleaning it up a little, and then saving it back to a new csv file. The problem is that the new csv file has a new column (first column in fact), labelled as index. Now this is not the row index, as I have turned that off in the to_csv() function as you can see in the code. Plus row index doesn't have a column label as well.
df = pd.read_csv('D1.csv', na_values=0, nrows = 139) # Read csv, with 0 values converted to NaN
df = df.dropna(axis=0, how='any') # Delete any rows containing NaN
df = df.reset_index()
df.to_csv('D1Clean.csv', index=False)
Any ideas where this phantom column is coming from and how to get rid of it?
I think you need add parameter drop=True to reset_index:
df = df.reset_index(drop=True)
drop : boolean, default False
Do not try to insert index into dataframe columns. This resets the index to the default integer index.

Categories

Resources