conversion on pandas queries python [closed]

conversion on pandas queries python [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I want to read a .dat file and to extract lines which their timestamp==2000 and to save them on a file, here is my code and I get that error on converting query error screenshot
can someone help me ?

make sure your field is a datetime
ratings_df['timestamp'] = pd.to_datetime(ratings_df['timestamp'])
You can then pull year from it
ratings_df['timestamp'].dt.year

I found the answer def dateparse (time_in_secs):
return datetime.datetime.fromtimestamp(float(time_in_secs))
ratings_df = pd.read_table('~/ml-1m/ratings.dat', header=None, sep='::', names=['user_id', 'movie_id', 'rating', 'timestamp'],parse_dates=['timestamp'],date_parser=dateparse)
thank you for your help

You should use the pandas function to_datetime instead of the native datetime functionality.
ratings_df.loc[pd.to_datetime(ratings_df['Timestamp'], unit='s').year == 2000, colonnes]
The format of the timestamp is an important factor to get the right output.

Related

convert a string to a dict when literal_eval not working [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 months ago.
Improve this question
I am trying to convert a pandas column value from str to dict.
The value looks like this :
'""en-US":"!!PLAY""'
I tried with eval, with literal_eval, with json.loads and I only have errors SyntaxError or json.decoder.JSONDecodeError.
The only solution left I see is to find each string with regex and form a dict after. That is pretty heavy to do. Anyone with an idea ?

Split the string at ":" then add them to dictionary use slicing to remove extra quotes
key , val = '""en-US":"!!PLAY""'.split(":")
dic = {key[2:-1]:val[1:-2]}
print(dic)

extraction of data from .dat file in python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
this is how my .dat file looks like i want to know how to extract data from it like i want it like 1::Toy Story (1995) each thing in separate column. also i want to do it without pandas and numpy is there anyway possible
with open('ml-1m/movies.dat',encoding='iso-8859-1 ') as datFile:
print([data.split()[0] for data in datFile])

here is one way with result as a dictionnary
dict_of_film = {}
for i in open(r"path").readlines():
index,name,genre,_ = (i.replace("\n",'').split('::'))
dict_of_film[index] = { "name" : name , "genre" : genre }
print(dict_of_film)

Count the number of null values using pandas framework [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Write python code to find the total number of null values on the excel file without using isnull function ( you should use loop statement)

Using for statement in dataframes is not a good practice. If you're doing only for challenge, this could help:
data = pd.read_excel('data.xlsx')
data.fillna(False, inplace=True)
amount_nan = 0
for column in data:
for value in data[column]:
if not value:
amount_nan = amount_nan + 1
print(amount_nan)

I cannot assign a variable with Python's falsy [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm stuck. I want to assign a variable asset to a CSV file, if the file is absent, a second file is sought.
I got this code but it seems not to work.
a = pd.read_csv('me.csv') or 'report_generator.csv'
print(a)

You could try something like this:
import pandas as pd
try:
df = pd.read_csv('one.csv')
except:
df = pd.read_csv('two.csv')
print(df.head())
It would try to open 'one.csv', and if any error occurs (note that the error could be even if the csv exists but the data was corrupted), then it would open 'two.csv'.
[EDITED]
A more beautiful and safe solution would be checking whether the file exists explicitly:
import os.path as path
if path.exists('one.csv'):
df = pd.read_csv('one.csv')
elif path.exists('two.csv'):
df = pd.read_csv('two.csv')
else:
print("Could not find csv")

How to do pandas column selection while getting code completion? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
When working with pandas, I often use name based column indexing. E.g:
df = pd.DataFrame({"abc":[1,2], "bde":[3,4], "mde":[3,4]})
df[["mde","bde"]]
As I have longer column names it because easy for me to create a typo in the column names since they are strings and no code completion. It'd be great if I could do something like:
df.SelectColumnsByObjectAttributeNotString([df.mde, df.bde])

IIUC, you can use name attribute.
df = pd.DataFrame({"a":[1,2], "b":[3,4]})
columns = [df.a.name, df.b.name]
columns
['a', 'b']

I think you may be looking for:
df.columns.values.tolist()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

conversion on pandas queries python [closed] - python

make sure your field is a datetime ratings_df['timestamp'] = pd.to_datetime(ratings_df['timestamp']) You can then pull year from it ratings_df['timestamp'].dt.year

You should use the pandas function to_datetime instead of the native datetime functionality. ratings_df.loc[pd.to_datetime(ratings_df['Timestamp'], unit='s').year == 2000, colonnes] The format of the timestamp is an important factor to get the right output.

Related

convert a string to a dict when literal_eval not working [closed]

extraction of data from .dat file in python [closed]

Count the number of null values using pandas framework [closed]

I cannot assign a variable with Python's falsy [closed]

How to do pandas column selection while getting code completion? [closed]

Categories

Resources