separate data from 1 column to 2 column in python - python

I have a dataset consisting of 1 column. In this column there are 2 data. How to separate the data into a second column. i want my dataset to be 2 columns. Thank you
here is my dataset :
i wanna make like this :

If your file is tab delimited, just use
import pandas as pd
df = pd.from_csv(file_path,sep='\t')

Try this:
Directly read the file using read_csv and define these parameters to specify '\t' as delimiter and to ignore error lines set error_bad_lines=False.
df = pd.read_csv('SINGGALANG.tsv',sep='\t',header=None,names=["Tokens","Tags"],error_bad_lines=False, engine="python")
Alternatively, If you want to read all the data in 1 column and then to split it in different columns you can do following:
Assuming, you have dataframe like this:
df = pd.DataFrame(['val1 0','val2 0','val3 0'])
You can use split and create columns.
df[["col1","col2"]] = df[0].str.split(expand=True)

Related

Reading csv file with no column names python

Hi I wanted to ask that i got a data of 80 columns and all of them have no names and its 80 columns and 12500 rows long and there are blank columns between them
8,1,,0,1993,146,,2,1,,,,,,,,,,2.1,0.65,0.15,0.65,19.1,,18.03,,,19.6,,,0.06,,,,,,19.1,19.6294717,19.36473585,0.06,,,,51.25,19.3,23.3,-0.04,-0.04,0.34,0.07,0.16,,,0.16,,,,,,,,,,,,,,,,,,15.3,7.8,11.55,58,100,79,15.4,8,11.7
I want to use this csv file and i want to delete the extra blank columns which i might do from using the code line
data = data.dropna()
bur it only can delete the rows i think further more how can I access or name the particular column
data = pd.read_csv('CollectedData.csv')
IIUC, use this pandas.read_csv with header=None and pandas.DataFrame.dropna on axis=1:
data = pd.read_csv("CollectedData.csv", header=None).dropna(axis=1, how="all")
If you need to give names to each column, use names parameter:
cols_names= ["ColA", "ColB", "ColC", ..]
data = (
pd.read_csv("CollectedData.csv",
header=None,
names=cols_names)
.dropna(axis=1, how="all")
)

pandas read excel without unnamed columns

Trying to read excel table that looks like this:
B
C
A
data
data
data
data
data
but read excel doesn't recognizes that one column doesn't start from first row and it reads like this:
Unnamed : 0
B
C
A
data
data
data
data
data
Is there a way to read data like i need? I have checked parameters like header = but thats not what i need.
A similar question was asked/solved here. So basically the easiest thing would be to either drop the first column (if thats always the problematic column) with
df = pd.read_csv('data.csv', index_col=0)
or remove the unnamed column via
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
You can skip automatic column labeling with something like pd.read_excel(..., header=None)
This will skip random labeling.
Then you can use more elaborate computation (e.g. first non empty value) to get the labels such as
df.apply(lambda s: s.dropna().reset_index(drop=True)[0])

I have to extract all the rows in a .csv corresponding to the rows with 'watermelon' through pandas

I am using this code. but instead of new with just the required rows, I'm getting an empty .csv with just the header.
import pandas as pd
df = pd.read_csv("E:/Mac&cheese.csv")
newdf = df[df["fruit"]=="watermelon"+"*"]
newdf.to_csv("E:/Mac&cheese(2).csv",index=False)
I believe the problem is in how you select the rows containing the word "watermelon". Instead of:
newdf = df[df["fruit"]=="watermelon"+"*"]
Try:
newdf = df[df["fruit"].str.contains("watermelon")]
In your example, pandas is literally looking for cells containing the word "watermelon*".
missing the underscore in pd.read_csv on first call, also it looks like the actual location is incorrect. missing the // in the file location.

Reset labels in Pandas DataFrame, Python

I have a csv file with a wrong first row data. The names of labels are in the row number 2. So when I am storing this file to the DataFrame the names of labels are incorrect. And correct names become values of the row 0. Is there any function similar to reset_index() but for columns? PS I can not change csv file. Here is an image for better understanding. DataFrame with wrong labels
Hello let's suppose you csv file is data.csv :
Try this code:
import pandas as pd
#reading the csv file
df = pd.read_csv('data.csv')
#changing the headers name to integers
df.columns = range(df.shape[1])
#saving the data in another csv file
df.to_csv('data_without_header.csv',header=None,index=False)
#reading the new csv file
new_df = pd.read_csv('data_without_header.csv')
#plotting the new data
new_df.head()
If you do not care about the rows preceding your column names, you can pass in the "header" argument with the value of the correct row, for example if the proper column names are in row 2:
df = pd.read_csv('my_csv.csv', header=2)
Keep in mind that this will erase the previous rows from the DataFrame. If you still want to keep them, you can do the following thing:
df = pd.read_csv('my_csv.csv')
df.columns = df.iloc[2, :] # replace columns with values in row 2
Cheers.

Select 2 ranges of columns to load - read_csv in pandas

I'm reading in an excel .csv file using pandas.read_csv(). I want to read in 2 separate column ranges of the excel spreadsheet, e.g. columns A:D AND H:J, to appear in the final DataFrame. I know I can do it once the file has been loaded in using indexing, but can I specify 2 ranges of columns to load in?
I've tried something like this....
usecols=[0:3,7:9]
I know I could list each column number induvidually e.g.
usecols=[0,1,2,3,7,8,9]
but I have simplified the file in question, in my real file I have a large number of rows so I need to be able to select 2 large ranges to read in...
I'm not sure if there's an official-pretty-pandaic-way to do it with pandas.
But, You can do it this way:
# say you want to extract 2 ranges of columns
# columns 5 to 14
# and columns 30 to 66
import pandas as pd
range1 = [i for i in range(5,15)]
range2 = [i for i in range(30,67)]
usecols = range1 + range2
file_name = 'path/to/csv/file.csv'
df = pd.read_csv(file_name, usecols=usecols)
As #jezrael notes you can use numpy.r to do this in a more pythonic and legible way
import pandas as pd
import numpy as np
file_name = 'path/to/csv/file.csv'
df = pd.read_csv(file_name, usecols=np.r_[0:3, 7:9])
Gotchas
Watch out when use in combination with names that you have allowed for the extra column pandas adds for the index ie. For csv columns 1,2,3 (3 items) np.r_ needs to be 0:3 (4 items)

Categories

Resources