a for loop to pull df.columns into a custom code - python

please be gentle total Python newbie, I'm currently writing a script which is turning out to be very long and i thought there must! be a for loop method to make this easier. I'm currently going through a CSV, pulling the header titles and placing it within a str.replace code, manually.
df['Col 1'] = df['Col 1'].str.replace('text','replacement')
I figured it would start like this.. but no idea how to proceed!
Import pandas as pd
df = pd.read_csv('file.csv')
for row in df.columns:
if (df[,:] =...
Sorry I know this probably looks terrible, but this is all I could fathom with my limited knowledge!
Thanks!

jezrael comment solved it much more ellegantly.
But, in case you needed specific code for each column it would go something like this:
import pandas as pd
df = pd.read_csv('file.csv')
for column in df.columns:
df[column] = df[column].str.replace('text','replacement')

No worries! We've all been there.
Your import statement should be lowercase: import pandas as pd
In your for loop, I think there's a misunderstanding of what you'll be iterating over. The for row in df.columns will iterate over the column names, not the rows.
Is it correct to say that you'd like to convert the column names to strings?

You can do a multiple-column replacement in one shot with replace by passing in a dictionary.
Say you want to replace t1 with r1 in column a; t2 with r2 in column b, you can do
df.replace({"a":{"t1":"r1"}, "b":{"t2":"r2"}})

df = pd.read_csv('file.csv',usecols=['List of column names you want to use from your csv'],
names=['list of names of column you want your pandas df to have'])
You should read the docs and identify the fields that are important in your case.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

Related

how can I extract specific row which contain specific keyword from my json dataset using pandas in python?

sorry that might be very simple question but I am new to python/json and everything. I am trying to filter my twitter json data set based on user_location/country_code/gb. but I have no idea how to do this. I have tried several ways but still no chance. I have attached my data set and some codes I have used here. I would appreciate any help.
here is what I did to get the best result however I do not know how to tell it to go for whole data set and print out the result of tweet_id:
import json
import pandas as pd
df = pd.read_json('example.json', lines=True)
if df['user_location'][4]['country_code'] == 'th':
print (df.tweet_id[4])
else:
print('false')
this code show me the tweet_id : 1223489829817577472
however, I couldn't extend it to the whole data set.
I have tried theis code as well, still no chance:
dataset = df[df['user_location'].isin([ "gb" ])].copy()
print (dataset)
that is what my data set looks like:
I would break the user_location column into multiple columns using the following
df = pd.concat([df, df.pop('user_location').apply(pd.Series)], axis=1)
Running this should give you a column each for the keys contained within the user_location json. Then it should be easy to print out tweet_ids based on country_code using:
df[df['country_code']=='th']['tweet_id']
An explanation of what is actually happening here:
df.pop('user_location') removes the 'user_location' column from df and returns it at the same time
With the returned column, we use the .apply method to apply a function to the column
pd.Series converts the JSON data/dictionary into a DataFrame
pd.concat concatenates the original df (now without the 'user_location' column) with the new columns created from the 'user_location' data

How to split a string by multiple conditions?

I am trying to split a column of data into multiple columns based on a condition ",,".
But it should also split the data when it encounters ",,,,".
Basically it should also consider ",," as ",,,,".
My code
import pandas as pd
df = pd.DataFrame()
df['data'] = data
df
df.columns = ['header']
final = df["header"].str.split(",,",n = 2, expand = True)
final
Thanks for your help !
If you just need to split a string with more than one delimiter,
you can use re.split(string=your_string, pattern=',,,,|,,')
after importing re.
If you need something specific for Pandas, I don't know that.

How to create a column in the existing dataframe with independent row values

I have a dataframe like this in pandas DataFrame where A is the column name:
results.head()
A
when you are away
when I was away
when they are away
I want to add a new column B which would seem like the following:
A B
when you are away you
when I was away I
when they are away they
I tried with this code but it did not work:
results.assign(B = you, I, they)
I am new to pandas dataframe and would very much appreciate the help.
Try this:
B_list = ['you','I','they']
results['B'] = B_list
OR
results = results.assign(B=['you', 'I', 'they'])
If you are interested in placing the column in a specific location (index), then use the insert method:
results.insert(index, "ColName" , new_set)
otherwise, the answer by Mayank is simpler.

displaying Pandas DataFrame in HTML without the extra row

If I use DataFrame.set_index, I get this result:
import pandas as pd
df = pd.DataFrame([['foo',1,3.0],['bar',2,2.9],
['baz',4,2.85],['quux',3,2.82]],
columns=['name','order','gpa'])
df.set_index('name')
Note the unnecessary row... I know it does this because it reserves the upper left cell for the column title, but I don't care about it, and it makes my table look somewhat unprofessional if I use it in a presentation.
If I don't use DataFrame.set_index, the extra row is gone, but I get numeric row indices, which I don't want:
If I use to_html(index=False) then I solve those problems, but the first column isn't bold:
import pandas as pd
from IPython.display import HTML
df = pd.DataFrame([['foo',1,3.0],['bar',2,2.9],
['baz',4,2.85],['quux',3,2.82]],
columns=['name','order','gpa'])
HTML(df.to_html(index=False))
If I want to control styling to make the names boldface, I guess I could use the new Styler API via HTML(df.style.do_something_here().render()) but I can't figure out how to achieve the index=False functionality.
What's a hacker to do? (besides construct the HTML myself)
I poked around in the source for Styler and figured it out; if you set df.index.names = [None] then this suppresses the "extra" row (along with the column header that I don't really care about):
import pandas as pd
df = pd.DataFrame([['foo',1,3.0],['bar',2,2.9],
['baz',4,2.85],['quux',3,2.82]],
columns=['name','order','gpa'])
df = df.set_index('name')
df.index.names = [None]
df
These days pandas actually has a keyword for this:
df.to_html(index_names=False)
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_html.html

concatenate excel datas with python or Excel

Here's my problem, I have an Excel sheet with 2 columns (see below)
I'd like to print (on python console or in a excel cell) all the data under this form :
"1" : ["1123","1165", "1143", "1091", "n"], *** n ∈ [A2; A205]***
We don't really care about the Column B. But I need to add every postal code under this specific form.
is there a way to do it with Excel or in Python with Panda ? (If you have any other ideas I would love to hear them)
Cheers
I think you can use parse_cols for parse first column and then filter out all columns from 205 to 1000 by skiprows in read_excel:
df = pd.read_excel('test.xls',
sheet_name='Sheet1',
parse_cols=0,
skiprows=list(range(205,1000)))
print (df)
Last use tolist for convert first column to list:
print({"1": df.iloc[:,0].tolist()})
The simpliest solution is parse only first column and then use iloc:
df = pd.read_excel('test.xls',
parse_cols=0)
print({"1": df.iloc[:206,0].astype(str).tolist()})
I am not familiar with excel, but pandas could easily handle this problem.
First, read the excel to a DataFrame
import pandas as pd
df = pd.read_excel(filename)
Then, print as you like
print({"1": list(df.iloc[0:N]['A'])})
where N is the amount you would like to print. That is it. If the list is not a string list, you need to cast the int to string.
Also, there are a lot parameters that can control the load part of excel read_excel, you can go through the document to set suitable parameters.
Hope this would be helpful to you.

Categories

Resources