Supplying the values from first column of dataframe - python

In one of the code snippet, the authors provide the input as:
variants = [ 'rs425277', 'rs1571149', 'rs1240707', 'rs1240708', 'rs873927', 'rs880051', 'rs1878745', 'rs2296716', 'rs2298217', 'rs2459994' ]
However I have similar values as one of the column in csv format. I would like to know how I can supply one of the column as input similar to above example?
Thanks in advance

First, import your csv as a Pandas df.
df = pd.read_csv('data.csv')
Then, you can get a list from pandas dataframe column:
col_one_list = df['column_one'].tolist()

Related

Adding rows using timestamp

I saw this code
combine rows and add up value in dataframe,
but I want to add the values in cells for the same day, i.e. add all data for a day. how do I modify the code to achieve this?
Check below code:
import pandas as pd
df = pd.DataFrame({'Price':[10000,10000,10000,10000,10000,10000],
'Time':['2012.05','2012.05','2012.05','2012.06','2012.06','2012.07'],
'Type':['Q','T','Q','T','T','Q'],
'Volume':[10,20,10,20,30,10]
})
df.assign(daily_volume = df.groupby('Time')['Volume'].transform('sum'))
Output:

Creating list from imported CSV file with pandas

I am trying to create a list from a CSV. This CSV contains a 2 dimensional table [540 rows and 8 columns] and I would like to create a list that contains the values of an specific column, column 4 to be specific.
I tried: list(df.columns.values)[4], it does mention the name of the column but i'm trying to get the values from the rows on column 4 and make them a list.
import pandas as pd
import urllib
#This is the empty list
company_name = []
#Uploading CSV file
df = pd.read_csv('Downloads\Dropped_Companies.csv')
#Extracting list of all companies name from column "Name of Stock"
companies_column=list(df.columns.values)[4] #This returns the name of the column.
companies_column = list(df.iloc[:,4].values)
So for this you can just add the following line after the code you've posted:
company_name = df[companies_column].tolist()
This will get the column data in the companies column as pandas Series (essentially a Series is just a fancy list) and then convert it to a regular python list.
Or, if you were to start from scratch, you can also just use these two lines
import pandas as pd
df = pd.read_csv('Downloads\Dropped_Companies.csv')
company_name = df[df.columns[4]].tolist()
Another option: If this is the only thing you need to do with your csv file, you can also get away just using the csv library that comes with python instead of installing pandas, using this approach.
If you want to learn more about how to get data out of your pandas DataFrame (the df variable in your code), you might find this blog post helpful.
I think that you can try this for getting all the values of a specific column:
companies_column = df[{column name}]
Replace "{column name}" with the column you want to access the values of.

Create a blank table (array) of infinite rows and 25 columns in Python

I am trying to create an un-initialized table or 2-d array of unlimited rows and 25 columns in Python:
table=pd.DataFrame({'A1':[],'A2':[],'A3':[] })
List_Column=['A1', 'A2'.....'A25']
But in above case, I have to manually enter the column names. Suppose I have List_Column, how can I update the column name from the List_Column?
Also,, at some point I want to update the blank array with a row from a list:
List_Row=['1', 'a', '25'.....'last']
So, that the final output looks like:
Based on the Pandas DataFrame documentation, you can initialize a Dataframe using only the columns. This will default the index value if it isn't provided, but you can do
pandas.DataFrame(column=List_Column)
From there you can add the rows in the way described by the Pandas DataFrame.append() docs.
Here is an easy way to do that. If you will add your data from a file, you will find more efficient ways to do that.
import pandas as pd
List_Column=['A1', 'A2', 'A25']
table = pd.DataFrame(columns = List_Column)
table['A'] = [1,2,3]
table['A2'] = ['A','B','C']
table

Python: Create dataframe with 'uneven' column entries

I am trying to create a dataframe where the column lengths are not equal. How can I do this?
I was trying to use groupby. But I think this will not be the right way.
import pandas as pd
data = {'filename':['file1','file1'], 'variables':['a','b']}
df = pd.DataFrame(data)
grouped = df.groupby('filename')
print(grouped.get_group('file1'))
Above is my sample code. The output of which is:
What can I do to just have one entry of 'file1' under 'filename'?
Eventually I need to write this to a csv file.
Thank you
If you only have one entry in a column the other will be NaN. So you could just filter the NaNs by doing something like df = df.at[df["filename"].notnull()]

Python2.7: How to split a column into multiple column based on special strings like this?

I'm a newbie for programming and python, so I would appreciate your advice!
I have a dataframe like this.
In 'info' column, there are 7 different categories: activities, locations, groups, skills, sights, types and other. and each categories have unique values within [ ].(ie,"activities":["Tour"])
I would like to split 'info' column into 7 different columns based on each category as shown below.
I would like to allocate appropriate column names and also put corresponding unique strings within [ ] to each row.
Is there any easy way to split dataframe like that?
I was thinking to use str.split functions to split into pieces and merge everthing later. But not sure that is the best way to go and I wanted to see if there is more sophisticated way to make a dataframe like this.
Any advice is appreciated!
--UPDATE--
When print(dframe['info']), it shows like this.
It looks like the content of the info column is JSON-formatted, so you can parse that into a dict object easily:
>>> import json
>>> s = '''{"activities": ["Tour"], "locations": ["Tokyo"], "groups": []}'''
>>> j = json.loads(s)
>>> j
{u'activities': [u'Tour'], u'locations': [u'Tokyo'], u'groups': []}
Once you have the data as a dict, you can do whatever you like with it.
Ok, here is how to do it :
import pandas as pd
import ast
#Initial Dataframe is df
mylist = list(df['info'])
mynewlist = []
for l in mylist:
mynewlist.append(ast.literal_eval(l))
df_info = pd.DataFrame(mynewlist)
#Add columns of decoded info to the initial dataset
df_new = pd.concat([df,df_info],axis=1)
#Remove the column info
del df_new['info']
You can use the json library to do that.
1) import the json libray
import json
2) Turn into string all the rows of that column and then Apply the json.loads function to all of them. Insert the result in an object
jsonO = df['info'].map(str).apply(json.loads)
3)The Json object is now a json dataframe in which you can navigate. For each columns of your Json dataframe, create a column in your final dataframe
df['Activities'] = jsonO.apply(lambda x: x['Activities'])
Here for one column of your json dataframe each 'rows' is dump in the new column of your final dataframe df
4) Re-do 3 for all the columns you're interested in

Categories

Resources