How to skip Column title row in Pandas DataFrame [duplicate] - python

This question already has answers here:
Prevent pandas read_csv treating first row as header of column names
(4 answers)
Closed 3 years ago.
How to skip Column title row in Pandas DataFrame
My Code:
sample = pd.DataFrame(pd.read_csv('Fremont TMY_Sample_Original.csv', `Import csv`low_memory=False))
sample_header = sample.iloc[:1, 0:20] `Wants to separate first two row because these are different data at start `
sample2 = sample[sample.iloc[:, 0:16] `wants to take required data for next process`
sample2 = ('sample2', (header=False)) `Trying to skip column title row`
print(sample2)
expected output:
its an example
Data for all year (This row I wants to remove and Remaining I wants to keep)
Date Time(Hour) WindSpeed(m/s)
0 5 1 10
1 4 2 17
2 6 3 16
3 7 4 11

This should work
df = pd.read_csv("yourfile.csv", header = None)

Related

Python - How to remove unmatched rows from two csv's based on one column without merge the two csv files [duplicate]

This question already has answers here:
Finding common rows between two dataframes based on a column using pandas
(2 answers)
Find common elements in two dataframes
(2 answers)
Closed 7 months ago.
I have 2 csv files having two columns A and B-
a.csv -
A B
12 0
13 1
14 1
b.csv -
A B
12 3
13 2
15 1
I want to remove the unmatched rows from both the csv based on column 'A' without merge both the files. The output I need is like after removing unmatched rows-
a.csv -
A B
12 0
13 1
b.csv -
A B
12 3
13 2
Thank you in advance.
You can use pandas
import pandas as pd
df_a = pd.read_csv('a.csv')
df_b = pd.read_csv('b.csv')
same_values = set(df_a['A']).intersection(df_b['A'])
df_a = df_a[df_a['A'].isin(same_values)]
df_b = df_b[df_b['A'].isin(same_values)]
df_a.to_csv('a.csv')
df_b.to_csv('b.csv')

How to select specific row elements according to the length of each element pandas [duplicate]

This question already has answers here:
filter dataframe rows based on length of column values
(4 answers)
Closed 3 years ago.
I'm trying to drop some rows from my dataframe according to a specific element if the elements length is different than 7 then this element should be discarded.
for i in range(len(ups_df['Purchase_Order'])):
if( len(ups_df['Purchase_Order'][i]) != 7):
del ups_df['Purchase_Order'][i]
print(len(ups_df['Purchase_Order']))
The output I get is key error 4
I would solve this issue using lambda to filter the specific row in the dataframe with the condition, to keep only rows that have a length of 7. You can obviously change or adapt this to fit your needs:
filtered_df = ups_df[ups_df['Purchase_order'].apply(lambda x: len(str(x)) == 7)]
This is an example:
data = {'A':[1,2,3,4,5,6],'Purchase_order':['aaaa111','bbbb222','cc34','f41','dddd444','ce30431404']}
ups_df = pd.DataFrame(data)
filtered_df = ups_df[ups_df['Purchase_order'].apply(lambda x: len(str(x)) == 7)]
Original dataframe:
A Purchase_order
0 1 aaaa111
1 2 bbbb222
2 3 cc34
3 4 f41
4 5 dddd444
5 6 ce30431404
After filtering (dropping the rows that have a length different than 7):
A Purchase_order
0 1 aaaa111
1 2 bbbb222
4 5 dddd444

python replace not na value [duplicate]

This question already has answers here:
How to replace NaN values by Zeroes in a column of a Pandas Dataframe?
(17 answers)
Closed 3 years ago.
I want to create a new column and replace NA and not missing value with 0 and 1.
#df
col1
1
3
5
6
what I want:
#df
col1 NewCol
1 1
3 1
0
5 1
0
6 1
This is what I tried:
df['NewCol']=df['col1'].fillna(0)
df['NewCol']=df['col1'].replace(df['col1'].notnull(), 1)
It seems that the second line is incorrect.
Any suggestion?
You can try:
df['NewCol'] = [*map(int, pd.notnull(df.col1))]
Hope this helps.
First you will need to convert all 'na's into '0's. How you do this will vary by scope.
For a single column you can use:
df['DataFrame Column'] = df['DataFrame Column'].fillna(0)
For the whole dataframe you can use:
df.fillna(0)
After this, you need to replace all nonzeros with '1's. You could do this like so:
for index, entry in enumerate(df['col']):
if entry != 0:
df['col'][index] = 1
Note that this method counts 0 as an empty entry, which may or may not be the desired functionality.

Combining pandas rows based on condition [duplicate]

This question already has answers here:
Pandas groupby with delimiter join
(2 answers)
Concatenate strings from several rows using Pandas groupby
(8 answers)
Closed 3 years ago.
Given a Pandas Dataframe df, with column names 'Session', and 'List':
Can I group together the 'List' values for the same values of 'Session'?
My Approach
I've tried solving the problem by creating a new dataframe, and iterating through the rows of the inital dataframe while maintaing a session counter that I increment if I see that the session has changed.
If it hasn't changed, then I append the List value that corresponds to that rows value with a comma.
Whenever the session changes, I used strip to get rid of the last comma (extra).
Initial DataFrame
Session List
0 1 a
1 1 b
2 1 c
3 2 d
4 2 e
5 3 f
Required DataFrame
Session List
0 1 a,b,c
1 2 d,e
2 3 f
Can someone suggest something more efficient or simple?
Thank you in advance.
Use groupby and apply and reset_index:
>>> df.groupby('Session')['List'].agg(','.join).reset_index()
Session List
0 1 a,b,c
1 2 d,e
2 3 f
>>>

How can I drop rows in a dataframe efficiently ir a specific column contains a substring [duplicate]

This question already has answers here:
Pandas filtering for multiple substrings in series
(3 answers)
Closed 4 years ago.
I tried
df = df[~df['event.properties.comment'].isin(['Extra'])]
Problem is it would just drop the row if the column contains exactly 'Extra' and I need to drop the ones that contain it even as a substring.
Any help?
You can use or condition to have multiple conditions in checking string, for your requirement you may retain text if it have "Extra" or "~".
Considered df
vals ids
0 1 ~
1 2 bball
2 3 NaN
3 4 Extra text
df[~df.ids.fillna('').str.contains('Extra')]
Out:
vals ids
0 1 ~
1 2 bball
2 3 NaN

Categories

Resources