I have two excels I coverted into dataframes.
DF1: contains columns 'JobKeys' and 'Aircraft Numbers' (Amongst other data)
DF2: contains columns 'JobKeys' and 'Shortage' (Amongst other data)
I want to create a column 'Short' in DF1 mapping for the jobkeys present in DF2 (effectively a VLOOKUP)
For both I set JobKey as the index:
#Import relevant libraries:
import pandas as pd
import numpy as np
DF1 = pd.read_excel('...')
DF2 = pd.read.excel('...')
DF1['Short'] = " "
DF1.set_index('JobKey', inplace = True)
DF2.set_index('JobKey', inplace = True)
Both OK. I print a sample of both using DF.head() and it looks OK. I want to use .index.map() what was done here:
https://towardsdatascience.com/vlookup-implementation-in-python-in-three-simple-steps-93b5a290fd72
DF1["Short"]=DF1.index.map(DF2["Shortage"])
However I get the error:
---------------------------------------------------------------------------
C:\ProgramData\Anaconda3\lib\site-packages\pandas\indexes\base.py in map(self, mapper)
2439 applied : array
2440 """
-> 2441 return self._arrmap(self.values, mapper)
2442
2443 def isin(self, values, level=None):
pandas\src\algos_common_helper.pxi in pandas.algos.arrmap_object (pandas\algos.c:46681)()
TypeError: 'Series' object is not callable
-------
Any ideas as to why? it seems pretty straight forward yet I can't find the cause of my problem.
I have a column 'a' in a dataframe where the data looks this:
[[-1.13855, -1.13855, -1.2212, -1.27331, -1.32733, -1.39211, -1.46947, -1.55818, -1.65584, -1.75972, -1.86731, -1.97665, -2.08624, -2.19495, -2.30196, -2.40665, -2.50857, -2.60744, -2.70314, -2.79567, -2.885, -2.97102, -3.05448, -3.13868, -3.22736, -3.31942, -3.41041, -3.49954, -3.59207, -3.69467, -3.81331, -3.96048, -4.15626, -4.43863, -4.90479, -5.79363, -6.24746, -4.26896, -3.14354, -2.44187, -1.9507, -1.57115, -1.23503, -0.893369, -0.528228, -0.0869591, 0.616627, 0.406154, -0.479933, -0.479933],...]]
I have been told its a numpy array (not sure if that's the case, correct me if I'm wrong...)
I wish to put this dataframe into an sql database using pandas to_sql method.
In order to do so, I need to convert this 'object' to a dtype that sql accepts e.g. string.
I must be able to write it as a string and then retrieve it as a numpy array (or whatever it is).
However, I am having an error as mentioned in the title.
my code currently sort of looks like this:
import pandas as pd
import sqlite3 as sql
import sqlalchemy
import numpy as np
import io
from datetime import datetime
time = datetime.strptime('2020-01-01 00:00:00', '%Y-%m-%d %H:%M:%S')
testdata = {'time': time , 'a': [[[-1.13855, -1.13855, -1.2212, -1.27331, -1.32733, -1.39211, -1.46947, -1.55818, -1.65584, -1.75972, -1.86731, -1.97665, -2.08624, -2.19495, -2.30196, -2.40665, -2.50857, -2.60744, -2.70314, -2.79567, -2.885, -2.97102, -3.05448, -3.13868, -3.22736, -3.31942, -3.41041, -3.49954, -3.59207, -3.69467, -3.81331, -3.96048, -4.15626, -4.43863, -4.90479, -5.79363, -6.24746, -4.26896, -3.14354, -2.44187, -1.9507, -1.57115, -1.23503, -0.893369, -0.528228, -0.0869591, 0.616627, 0.406154, -0.479933, -0.479933]]]}
testdata = pd.DataFrame(testdata,index=['time'])
testdata
test_rows=[]
for index,row in testdata.iterrows():
t=row['time']
a=row['a'].astype('str')
new_rows = {'time': t , 'a':a}
test_rows.append(pd.Series(new_rows))
testframe = pd.DataFrame(test_rows)
testframe.set_index('time')
print(testframe.dtypes)
testframe
testframe.to_sql(name='Data2_mcw_conv',con=conn,if_exists='replace',index=True)
output:
time datetime64[ns]
a object
dtype: object
---------------------------------------------------------------------------
InterfaceError Traceback (most recent call last)
<ipython-input-82-c434bf1560a6> in <module>()
27 testframe
28
---> 29 testframe.to_sql(name='testframe',con=conn,if_exists='replace',index=True)
30
31
4 frames
/usr/local/lib/python3.7/dist-packages/pandas/io/sql.py in _execute_insert(self, conn, keys, data_iter)
1553 def _execute_insert(self, conn, keys, data_iter):
1554 data_list = list(data_iter)
-> 1555 conn.executemany(self.insert_statement(num_rows=1), data_list)
1556
1557 def _execute_insert_multi(self, conn, keys, data_iter):
InterfaceError: Error binding parameter 2 - probably unsupported type.
Before attempting to convert to sql using to_sql, the index seems to be an integer instead of time even if I explicitly set the time to be the index with set_index.
From here, I have the following problems:
How do I rectify this error?
For some reason, my index is an integer, how do I make the time column the index?
Any help would be much appreciated
When I want to remove some elements which satisfy a particular condition, python is throwing up the following error:
TypeError Traceback (most recent call last)
<ipython-input-25-93addf38c9f9> in <module>()
4
5 df = pd.read_csv('fb441e62df2d58994928907a91895ec62c2c42e6cd075c2700843b89.csv;
----> 6 df = filter(df,~('-02-29' in df['Date']))
7 '''tmax = []; tmin = []
8 for dates in df['Date']:
TypeError: 'int' object is not iterable
The following is the code :
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data/C2A2_data/BinnedCsvs_d400/fb441e62df2d58994928907a91895ec62c2c42e6cd075c2700843b89.csv');
df = filter(df,~('-02-29' in df['Date']))
What wrong could I be doing?
Following is sample data
Sample Data
Use df.filter() (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html)
Also please attach the csv so we can run it locally.
Another way to do this is to use one of pandas' string methods for Boolean indexing:
df = df[~ df['Date'].str.contains('-02-29')]
You will still have to make sure that all the dates are actually strings first.
Edit:
Seeing the picture of your data, maybe this is what you want (slashes instead of hyphens):
df = df[~ df['Date'].str.contains('/02/29')]
I am trying to convert a PyCall.jlwrap ('Julia') object to a Pandas dataframe. I'm using PyJulia to run an optimization algorithm in Julia, which spits out a dataframe object as a result. I would like to convert that object to a Pandas dataframe.
This is a similar question as posed 5 years ago here. However, there is not any code to suggest how to accomplish the transfer.
Any help would be useful!
Here is the code I currently have set-up. It's not that useful to know what is happening in the background of my 'optimization_program' but just to know that what is returned by the 'run_hybrid' and 'run_storage' commands returns a data frame:
### load in necessary modules for pyjulia
from julia import Main as jl
##load my user defined module
jl.include("optimization_program_v3.jl")
##run function from module
results = jl.run_hybrid(generic_inputs)
##test type of item returned
jl.typeof(results)
returns: <PyCall.jlwrap DataFrame>
##try to convert to pandas
test = pd.DataFrame(results)
Value Error Traceback (most recent call last)
in ()
----> 1 test = pd.DataFrame(results)
in init(self, data, index, columns, dtype, copy)
420 dtype=values.dtype, copy=False)
421 else:
422 raise ValueError('DataFrame constructor not properly called!')
423
424 NDFrame.init(self, mgr, fastpath=True)
ValueError: DataFrame constructor not properly called!
I get an error (reading a Julia DataFrame in Python), if I use the DataFrames.jl package. However, it seems to work nicely with the Pandas.jl package:
>>> from julia import Main as jl
>>> import pandas as pd
>>> jl.eval('using Pandas')
>>> res = jl.eval('DataFrame(Dict(:age=>[27, 29, 27], :name=>["James", "Jill", "Jake"]))')
>>> jl.typeof(res)
#<PyCall.jlwrap PyObject>
>>> df = pd.DataFrame(res)
>>> df
age name
0 27 James
1 29 Jill
2 27 Jake
This was tested on Win10, with Python 3.8.2, and Julia 1.3.1
I want to convert all the items in the 'Time' column of my pandas dataframe from UTC to Eastern time. However, following the answer in this stackoverflow post, some of the keywords are not known in pandas 0.20.3. Overall, how should I do this task?
tweets_df = pd.read_csv('valid_tweets.csv')
tweets_df['Time'] = tweets_df.to_datetime(tweets_df['Time'])
tweets_df.set_index('Time', drop=False, inplace=True)
error is:
tweets_df['Time'] = tweets_df.to_datetime(tweets_df['Time'])
File "/scratch/sjn/anaconda/lib/python3.6/site-packages/pandas/core/generic.py", line 3081, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'to_datetime'
items from the Time column look like this:
2016-10-20 03:43:11+00:00
Update:
using
tweets_df['Time'] = pd.to_datetime(tweets_df['Time'])
tweets_df.set_index('Time', drop=False, inplace=True)
tweets_df.index = tweets_df.index.tz_localize('UTC').tz_convert('US/Eastern')
did no time conversion. Any idea what could be fixed?
Update 2:
So the following code, does not do in-place conversion meaning when I print the row['Time'] using iterrows() it shows the original values. Do you know how to do the in-place conversion?
tweets_df['Time'] = pd.to_datetime(tweets_df['Time'])
for index, row in tweets_df.iterrows():
row['Time'].tz_localize('UTC').tz_convert('US/Eastern')
for index, row in tweets_df.iterrows():
print(row['Time'])
to_datetime is a function defined in pandas not a method on a DataFrame. Try:
tweets_df['Time'] = pd.to_datetime(tweets_df['Time'])