I created an array full of number. I want to add data from the array to the prettytable basically using add_columns
number=[1,2,3,4,5,6,7,8,9,.....,100]
I want the output of the pretty table to be like this below
+-----------+------+------------+-----------------+
| Number |1 | 2 | 3 | 4 | ..... |
+-----------+------+------------+-----------------+
My code is shown below.
from prettytable import PrettyTable
x = PrettyTable()
x.add_column(["number", print(number)])
print(x)
When I run the python script, it generated an error
TypeError: add_column() missing 1 required positional argument: 'column'
How to achieve this?
you should use add raw like this and if
you need to add 'number' as text you should added in the list
number=['number',1,2,3,4,5,6,7,8,9]
from prettytable import PrettyTable
x = PrettyTable()
x.add_row(number)
print(x)
OUTPUT:
+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+
| Field 1 | Field 2 | Field 3 | Field 4 | Field 5 | Field 6 | Field 7 | Field 8 | Field 9 | Field 10 |
+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+
| number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+
Related
Let's say I have table which would look like that
| id | value_one | type | value_two |
|----|-----------|------|-----------|
| 1 | 2 | A | 1 |
| 1 | 4 | B | 1 |
| 2 | 3 | A | 2 |
| 2 | 1 | B | 3 |
I know that there are only A and B types for specific ID, what I want to achieve is to group those two values and calculate new type using formula A/B, it should be applied to value_one and value_two, so table afterwards should look like:
| id | value_one | type | value_two|
|----|-----------| -----|----------|
| 1 | 0.5 | C | 1 |
| 2 | 3 | C | 0.66 |
I am new to PySpark, and as for now I wasn't able to achieve described result, would appreciate any tips/solutions.
You can consider dividing the original dataframe into two parts according to type, and then use SQL statements to implement the calculation logic.
df.filter('type = "A"').createOrReplaceTempView('tmp1')
df.filter('type = "B"').createOrReplaceTempView('tmp2')
sql = """
select
tmp1.id
,tmp1.value_one / tmp2.value_one as value_one
,'C' as type
,tmp1.value_two / tmp2.value_two as value_two
from tmp1 join tmp2 using (id)
"""
reuslt_df = spark.sql(sql)
reuslt_df.show(truncate=False)
so I currently have a large csv containing data for a number of events.
Column one contains a number of dates as well as some id's for each event for example.
Basically I want to write something within Python that whenever there is an id number (AL.....) it creates a new csv with the id number as the title with all the data in it before the next id number so i end up with a csv for each event.
For info the whole csv contains 8 columns but the division into individual csvs is only predicated on column one
Use Python to split a CSV file with multiple headers
I notice this questions is quite similar but in my case I I have AL and then a different string of numbers after it each time and also I want to call the new csvs by the id numbers.
You can achieve this using pandas, so let's first generate some data:
import pandas as pd
import numpy as np
def date_string():
return str(np.random.randint(1, 32)) + "/" + str(np.random.randint(1, 13)) + "/1997"
l = [date_string() for x in range(20)]
l[0] = "AL123"
l[10] = "AL321"
df = pd.DataFrame(l, columns=['idx'])
# -->
| | idx |
|---:|:-----------|
| 0 | AL123 |
| 1 | 24/3/1997 |
| 2 | 8/6/1997 |
| 3 | 6/9/1997 |
| 4 | 31/12/1997 |
| 5 | 11/6/1997 |
| 6 | 2/3/1997 |
| 7 | 31/8/1997 |
| 8 | 21/5/1997 |
| 9 | 30/1/1997 |
| 10 | AL321 |
| 11 | 8/4/1997 |
| 12 | 21/7/1997 |
| 13 | 9/10/1997 |
| 14 | 31/12/1997 |
| 15 | 15/2/1997 |
| 16 | 21/2/1997 |
| 17 | 3/3/1997 |
| 18 | 16/12/1997 |
| 19 | 16/2/1997 |
So, interesting positions are 0 and 10 as there are the AL* strings...
Now to filter the AL* you can use:
idx = df.index[df['idx'].str.startswith('AL')] # get's you all index where AL is
dfs = np.split(df, idx) # splits the data
for out in dfs[1:]:
name = out.iloc[0, 0]
out.to_csv(name + ".csv", index=False, header=False) # saves the data
This gives you two csv files named AL123.csv and AL321.csv with the first line being the AL* string.
This question already has an answer here:
Pandas - Finding percent contributed by each group
(1 answer)
Closed 2 years ago.
I have a df and I want to create some new cols with it. How would I use the apply function to both pass in the row, and the entire df with it? I need the entire df to do some filtering, and the data is subject to the values in each row.
Or maybe I don't need to use apply, but that's the first thing that came to my mind. Thank you and all help is appreciated!
Ex of df:
+----+--------+--------+
| ID | Family | Amount |
+----+--------+--------+
| 1 | A | 2 |
| 2 | A | 10 |
| 3 | B | 4 |
| 4 | B | 7 |
+----+--------+--------+
Result:
+----+--------+--------+-----------+------------+
| ID | Family | Amount | Total_Fam | Id_Percent |
+----+--------+--------+-----------+------------+
| 1 | A | 2 | 12 | .166 |
| 2 | A | 10 | 12 | .833 |
| 3 | B | 4 | 11 | .363 |
| 4 | B | 7 | 11 | .636 |
+----+--------+--------+-----------+------------+
First, group by Family and then transform amount and then you can directly divide Amount by the new column.
df['Total_Fam'] = df.groupby('Family')['Amount'].transform(np.sum)
df['Id_Percent'] = df['Amount']/df['Total_Fam']
df
Using apply on a column passes each row individualy. If you use apply on the entire dataset, it sees the entire dataset, hence, you can use all columns. As you can see in the example below, df['new_2] which is made using a function which I apply to the dataset, I do not need to pass the df to it.
import pandas as pd
import seaborn as sns
df = sns.load_dataset('iris')
df['new'] = df['species'].apply(lambda x: x[:2])
def sumIsMore(dataframe):
x = dataframe['sepal_length']
y = dataframe['sepal_width']
return x+y >= 8.5
df['new_2'] = df.apply(sumIsMore, axis=1)
I am trying to output in a new column integers values (labels/classes) based on labels of another column in my dataset. Actually I did it by creating new columns (numerical column heading) for each class with boolean values in them, so then I can use these to create the new class column with numerical values. But I was trying to do it with a dictionary, which I think it is a good and faster way.
If I run a code like this:
x=df['Item_Type'].value_counts()
item_type_mapping={}
item_list=x.index
for i in range(0,len(item_list)):
item_type_mapping[item_list[i]]=i
It generates the dictionary, but then if I run:
df['Item_Type']=df['Item_Type'].map(lambda x:item_type_mapping[x])
or
df['New_column']=[item_type_mapping[item] for item in data.Item_Type]
It displays KeyError=None
Anybody know why this occurs? I think that's strange since the dictionary has been created and I can see it through my variables
Thanks
Edit 1
#Fourier
simply I have this column:
| Item_type|
| -------- |
| Nino |
| Nino |
| Nino |
| Pasquale |
| Franco |
| Franco |
and then I need the same column or a new one to display:
| Item_type| New_column |
| -------- | ---------- |
| Nino | 1 |
| Nino | 1 |
| Nino | 1 |
| Pasquale | 2 |
| Franco | 3 |
| Franco | 3 |
Your code works for me, but what you're trying to do is already provided by pandas as Categorical data.
df = pd.DataFrame({'Item_Type': list('abca')})
df['New_column'] = df.Item_Type.astype('category').cat.codes
Result:
Item_Type New_column
0 a 0
1 b 1
2 c 2
3 a 0
I am trying to select a grouped average.
a1_avg = session.query(func.avg(Table_A.a1_value).label('a1_avg'))\
.filter(between(Table_A.a1_date, '2011-10-01', '2011-10-30'))\
.group_by(Table_A.a1_group)
I have tried a few different iterations of this query and this is as close as I can get to what I need. I am fairly certain the group_by is creating the issue, but I am unsure how to correctly implement the query using SQLA. The table structure and expected output:
TABLE A
A1_ID | A1_VALUE | A1_DATE | A1_LOC | A1_GROUP
1 | 5 | 2011-10-05 | 5 | 6
2 | 15 | 2011-10-14 | 5 | 6
3 | 2 | 2011-10-21 | 6 | 7
4 | 20 | 2011-11-15 | 4 | 8
5 | 6 | 2011-10-27 | 6 | 7
EXPECTED OUTPUT
A1_LOC | A1_GROUP | A1_AVG
5 | 6 | 10
6 | 7 | 4
I would guess that you are just missing the group identifier (a1_group) in the result. Also (given I understand your model correctly), you need to add a group by clause also for a1_loc column:
edit-1: updated the query due to question specificaion
a1_avg = session.query(Table_A.a1_loc, Table_A.a1_group, func.avg(Table_A.a1_value).label('a1_avg'))\
.filter(between(Table_A.a1_date, '2011-10-01', '2011-10-30'))\
#.filter(Table_A.a1_id == '12')\ # #note: you do NOT NEED this
.group_by(Table_A.a1_loc)\ # #note: you NEED this
.group_by(Table_A.a1_group)