how to fix the csv ouput when using pandas and groupby - python

Well I have a simple csv, that has 2 columns and about 50 rows.
The first column is ip and other is cik, and I want to get how many ip's are there with the different cik. So this is my code that does that, and it work great:
code:
import pandas as pd
csv = pd.read_csv('test.csv')
df = pd.DataFrame(csv)
df = df.groupby('cik').count()
df = pd.DataFrame(df).to_csv('output.csv', index=False)
But the csv output is like:
ip
49
And I want it to be like when I print the df value after groupby and count, something like this:
So I have in the first column the cik and in other the number of ip's that have that cik.

Your option index=False makes the method omit row names which in your case is the 1515671, save it with simple:
df.to_csv('output.csv')

Try adding reset_index before you output to_csv.
import pandas as pd
csv = pd.read_csv('test.csv')
df = pd.DataFrame(csv)
df = df.groupby('cik').count().reset_index() #reset_index creates 0...n index and avoids cik as index
df.to_csv('output.csv', index=False)
OR
set the index=True while outputting to_csv
df.to_csv('output.csv', index=True)

Related

Can I read CSV with columns that has specific value(s) using Pandas?

I want to read a CSV using Pandas but only certain columns and only rows with spicific values. for example I have a csv of "people and their heights", I want to read the "name" column and "height" column of people that are > "160cm" height only. I want to do this in the first step of read_csv() not after loading it.
import pandas as pd
cols = ['name','height']
df = pd.read_csv("people_and_heights.csv", usecols=cols)
so I want to add a condition to read rows with certain values only or rows that doesn't have nulls for example.
How about this?:
import pandas as pd
from io import StringIO
with open("people_and_heights.csv") as file:
colNames = "\"col1\",\"name\",\"col3\",\"height\""
filteredCsv = "\n".join([colNames,"".join([line for index,line in enumerate(file) if index != 0 and int(line.split(',')[3]) >= 165])])
df = pd.read_csv(StringIO(filteredCsv),usecols=["name","height"])

How to read excel data only after a string is found but without using skiprows

I want to read the data after the string "Executed Trade". I want to do that dynamically. Not using "skiprows". I know openpyxl can be an option. But I am still struggling to do so. Could you guys please help me with that thing cause I have many files like the one is shown in image.
Try:
import pandas as pd
#change the Excel filename and the two mentions of 'col1' for whatever the column is
df = pd.read_excel('dictatorem.xlsx')
df = df.iloc[df.col1[df.col1 == 'Executed Trades'].index.tolist()[0]+1:]
df.columns = df.iloc[0]
df = df[1:]
df = df.reset_index(drop=True)
print(df)
Example input/output:

Unable to read a column of an excel by Column Name using Pandas

Excel Sheet
I want to read values of the column 'Site Name' but in this sheet, the location of this tab is not fixed.
I tried,
df = pd.read_excel('TestFile.xlsx', sheet_name='List of problematic Sites', usecols=['Site Name'])
but got value error,
ValueError: Usecols do not match columns, columns expected but not found: ['RBS Name']
The output should be, List of RBS=['TestSite1', 'TestSite2',........]
try reading the excel columns by this
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
df = pd.read_excel('File.xlsx', sheetname='Sheet1')
for i in df.index:
print(df['Site Name'][i])
You can first check dataframe without mentioning column name while reading excel file.
Then try to read column names.
Code is as below
import pandas as pd
df = pd.read_excel('TestFile.xlsx', sheet_name='List of problematic Sites')
print(df.head)
print(df.columns)

Python Pandas dont start on the first column

I loaded a CSV File using Python Pandas and want to drop every second column. I cant access the File from the first to last column. My CSV File has only one row with no captions. The origial file has about 1000 columns. For testing i use 12 columns. How to access the columns from the first to the last
I try to drop the first column by index. Later I want to iterate through it. I expect a index like 0 to 11 or index 1 to 12. Here is my code:
import pandas as pd
df = pd.read_csv("test.csv", index_col=0)
print(len(df.columns)) #returns 11 - expected: 12
df.drop(df.columns[0], axis=1)
df.to_csv('output.csv')
Code works, but with index 0 it drops the second column instead of the first and index 2 drops the fourth one and so on...
Hope you can help me
I've edit my code. Not pretty but it works:
import pandas as pd
fileName = 'test.csv'
dummy = pd.read_csv(fileName)
length = len(dummy.columns)
del dummy
df = pd.read_csv(fileName, usecols=[i for i in range(length) if i%2==0])
df.to_csv('output.csv', index=False)
Thank you for your answers

Pandas Data Frame saving into csv file

I wonder how to save a new pandas Series into a csv file in a different column. Suppose I have two csv files which both contains a column as a 'A'. I have done some mathematical function on them and then create a new variable as a 'B'.
For example:
data = pd.read_csv('filepath')
data['B'] = data['A']*10
# and add the value of data.B into a list as a B_list.append(data.B)
This will continue until all of the rows of the first and second csv file has been reading.
I would like to save a column B in a new spread sheet from both csv files.
For example I need this result:
colum1(from csv1) colum2(from csv2)
data.B.value data.b.value
By using this code:
pd.DataFrame(np.array(B_list)).T.to_csv('file.csv', index=False, header=None)
I won't get my preferred result.
Since each column in a pandas DataFrame is a pandas Series. Your B_list is actually a list of pandas Series which you can cast to DataFrame() constructor, then transpose (or as #jezrael shows a horizontal merge with pd.concat(..., axis=1))
finaldf = pd.DataFrame(B_list).T
finaldf.to_csv('output.csv', index=False, header=None)
And should csv have different rows, unequal series are filled with NANs at corresponding rows.
I think you need concat column from data1 with column from data2 first:
df = pd.concat(B_list, axis=1)
df.to_csv('file.csv', index=False, header=None)

Categories

Resources