Creating list from imported CSV file with pandas - python

I am trying to create a list from a CSV. This CSV contains a 2 dimensional table [540 rows and 8 columns] and I would like to create a list that contains the values of an specific column, column 4 to be specific.
I tried: list(df.columns.values)[4], it does mention the name of the column but i'm trying to get the values from the rows on column 4 and make them a list.
import pandas as pd
import urllib
#This is the empty list
company_name = []
#Uploading CSV file
df = pd.read_csv('Downloads\Dropped_Companies.csv')
#Extracting list of all companies name from column "Name of Stock"
companies_column=list(df.columns.values)[4] #This returns the name of the column.

companies_column = list(df.iloc[:,4].values)

So for this you can just add the following line after the code you've posted:
company_name = df[companies_column].tolist()
This will get the column data in the companies column as pandas Series (essentially a Series is just a fancy list) and then convert it to a regular python list.
Or, if you were to start from scratch, you can also just use these two lines
import pandas as pd
df = pd.read_csv('Downloads\Dropped_Companies.csv')
company_name = df[df.columns[4]].tolist()
Another option: If this is the only thing you need to do with your csv file, you can also get away just using the csv library that comes with python instead of installing pandas, using this approach.
If you want to learn more about how to get data out of your pandas DataFrame (the df variable in your code), you might find this blog post helpful.

I think that you can try this for getting all the values of a specific column:
companies_column = df[{column name}]
Replace "{column name}" with the column you want to access the values of.

Related

I have to extract all the rows in a .csv corresponding to the rows with 'watermelon' through pandas

I am using this code. but instead of new with just the required rows, I'm getting an empty .csv with just the header.
import pandas as pd
df = pd.read_csv("E:/Mac&cheese.csv")
newdf = df[df["fruit"]=="watermelon"+"*"]
newdf.to_csv("E:/Mac&cheese(2).csv",index=False)
I believe the problem is in how you select the rows containing the word "watermelon". Instead of:
newdf = df[df["fruit"]=="watermelon"+"*"]
Try:
newdf = df[df["fruit"].str.contains("watermelon")]
In your example, pandas is literally looking for cells containing the word "watermelon*".
missing the underscore in pd.read_csv on first call, also it looks like the actual location is incorrect. missing the // in the file location.

Exporting several scraped tables into a single CSV File

How can I concatenate the tables read from several HTML? I understand they are considered lists and lists are not possible to concatenate, but then how can I insert more than one table scraped from a different URL into one single CSV? Any ideas? Is it possible to save the print output in a variable and then move it into a CSV?
import pandas as pd
df = pd.read_html('URL')
df1 = pd.read_html('URL')
print(df, df1)
(**df,df1**).to_csv('name.csv')
The attribute (df,df1) is of course incorrect, just wrote it to describe what I am missing.
Thank you very much in advance
pd.read_html returns a list of dataframes. So, in case you are sure that the lists contains dataframes formated in a way that can be concatenated you can consolidate then into a single dataframe, then export it to csv:
import pandas as pd
dframes_list1 = pd.read_html('URL1')
dframes_list2 = pd.read_html('URL2')
dframes_all = dframes_list1 + dframes_list2
consolidated_dframe = pd.concat(dframes_all)
consolidated_dframe.to_csv('name.csv')

how can I extract specific row which contain specific keyword from my json dataset using pandas in python?

sorry that might be very simple question but I am new to python/json and everything. I am trying to filter my twitter json data set based on user_location/country_code/gb. but I have no idea how to do this. I have tried several ways but still no chance. I have attached my data set and some codes I have used here. I would appreciate any help.
here is what I did to get the best result however I do not know how to tell it to go for whole data set and print out the result of tweet_id:
import json
import pandas as pd
df = pd.read_json('example.json', lines=True)
if df['user_location'][4]['country_code'] == 'th':
print (df.tweet_id[4])
else:
print('false')
this code show me the tweet_id : 1223489829817577472
however, I couldn't extend it to the whole data set.
I have tried theis code as well, still no chance:
dataset = df[df['user_location'].isin([ "gb" ])].copy()
print (dataset)
that is what my data set looks like:
I would break the user_location column into multiple columns using the following
df = pd.concat([df, df.pop('user_location').apply(pd.Series)], axis=1)
Running this should give you a column each for the keys contained within the user_location json. Then it should be easy to print out tweet_ids based on country_code using:
df[df['country_code']=='th']['tweet_id']
An explanation of what is actually happening here:
df.pop('user_location') removes the 'user_location' column from df and returns it at the same time
With the returned column, we use the .apply method to apply a function to the column
pd.Series converts the JSON data/dictionary into a DataFrame
pd.concat concatenates the original df (now without the 'user_location' column) with the new columns created from the 'user_location' data

How do I remove a column from a dataframe using Pandas library for python?

Trying to remove a column from a dataframe with simple line of code using panda for python.
The name of the column that i'm trying to remove is "Comments"
import pandas as pd
location2 = 'datasets_travel_times.csv'
travelTime_df= pd.read_csv(location2)
traveltime_df = travelTime_df.drop('Comments',1)
traveltime_df
It does not give any error; but then I print the dataframe and see that the column "Comments" is still there
Here are 2 possible ways:-
First Way:-
travelTime_df.drop(['Comments'], axis=1)
Second way:-
travelTime_df.drop(columns=['Comments'])
Adding link for deep dive:-
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html

Create new column in panda data frame

I'm extremely new to python and have been searching google and stackoverflow to solve this issue which I am sure is simply a syntax problem.
I have a data frame with several columns.
import pandas as pd
df = pd.read_csv("C:/path/file.csv")
My csv has 5 columns and ~ 100k rows
I simply want a substring of the first 2 digits of column 5.
I've tried:
df.assign(new = lambda x: x.column5[0:2],)
This creates the new field and populates the first two rows with the complete value in column 5 and gives me NaN for the remainder.
These attempts give me syntax erros:
df['new'] = df['column5'].str[0:2]
df.map(lambda df['column5']: [:2])
I am simply at a loss of how to create a new column using the first two digits of an existing column from a table read in via pandas.
If this were SAS I'd have been done hours ago, but I am trying to make a go of Python so your help is appreciated
I guess your column5 column is of int*/float* dtype, so
try to convert it to string first:
df['new'] = df['column5'].astype(str).str[:2]
you can explicitly specify types of columns when reading CSV file:
df = pd.read_csv('file_name.csv', ..., dtype={'column5': object})

Categories

Resources