How to do pandas column selection while getting code completion? [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
When working with pandas, I often use name based column indexing. E.g:
df = pd.DataFrame({"abc":[1,2], "bde":[3,4], "mde":[3,4]})
df[["mde","bde"]]
As I have longer column names it because easy for me to create a typo in the column names since they are strings and no code completion. It'd be great if I could do something like:
df.SelectColumnsByObjectAttributeNotString([df.mde, df.bde])

IIUC, you can use name attribute.
df = pd.DataFrame({"a":[1,2], "b":[3,4]})
columns = [df.a.name, df.b.name]
columns
['a', 'b']

I think you may be looking for:
df.columns.values.tolist()

Related

Using Regex to drop period from many column names [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I'd like to drop a '.' from a column name using regex, and want the code to be applied to many column names that end in '.', so that each pair of like-named columns can be merged into one.
For example, the column names 'Fund' and 'Fund.' are different and have different values, but should become just 'Fund'.
What would be the best regex to use for this?
Try this:
df = pd.DataFrame([1], columns=['Fund.'])
df.columns = df.columns.str.replace('.','')
Output:
print(df.columns)
Index(['Fund'], dtype='object')

Count the number of null values using pandas framework [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Write python code to find the total number of null values on the excel file without using isnull function ( you should use loop statement)
Using for statement in dataframes is not a good practice. If you're doing only for challenge, this could help:
data = pd.read_excel('data.xlsx')
data.fillna(False, inplace=True)
amount_nan = 0
for column in data:
for value in data[column]:
if not value:
amount_nan = amount_nan + 1
print(amount_nan)

for-loop to retrieve and store variables in a list [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I retrieve several dataframes from spreadsheets of an Excel file.
I would like to store these dataframes into a list so that I can concatenate the dataframes into one dataframe.
However, how can I store the variables themselves instead of their them.
These are the data frames that I created.
df0120
df0220
df0320
df0419
df0519
df0619
df0719
df0819
df0919
df_lst = list()
for name in dir():
if name.startswith('df0'):
df_lst.append(name)
print(df_lst)
My results
['df0120', 'df0220'...]
Expected results
[df0120, df0220 ...]
What you see is how Python prints a list of strings in the built-in way, by itself. But, you can print it yourself in another way if you want:
print('['+', '.join(df_lst)+']')

Pandas dataframe overwrite object [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I would like to overwrite the matrix I have of dimensions n to matrix of dimensions m (n>m). Intuitive code like this does not work:
sigmaSmall = sigmaSmall.loc[indices, indices]
How can I do it in 1 line?
The 2nd dimension takes column names, not numbered indices.
So instead do:
sigmaSmall = sigmaSmall.loc[indices, sigmaSmall.columns[indices]]
Not knowing what your indicies are make it hard to tell, but it should look something like this
df = pd.DataFrame([[1,2,3],[1,2,3],[1,2,3]], columns=['a','b','c'])
df.loc[0:1, ['a','b']]
Where the second argument is the column names that you want to select

conversion on pandas queries python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I want to read a .dat file and to extract lines which their timestamp==2000 and to save them on a file, here is my code and I get that error on converting query error screenshot
can someone help me ?
make sure your field is a datetime
ratings_df['timestamp'] = pd.to_datetime(ratings_df['timestamp'])
You can then pull year from it
ratings_df['timestamp'].dt.year
I found the answer def dateparse (time_in_secs):
return datetime.datetime.fromtimestamp(float(time_in_secs))
ratings_df = pd.read_table('~/ml-1m/ratings.dat', header=None, sep='::', names=['user_id', 'movie_id', 'rating', 'timestamp'],parse_dates=['timestamp'],date_parser=dateparse)
thank you for your help
You should use the pandas function to_datetime instead of the native datetime functionality.
ratings_df.loc[pd.to_datetime(ratings_df['Timestamp'], unit='s').year == 2000, colonnes]
The format of the timestamp is an important factor to get the right output.

Categories

Resources