I want to create a key list for a large hdf file from an Excel file:
Table:
I want the key list to look like this:
/blue/a/a1
/blue/a/a2
/blue/b/b1
...
my code so far is:
import pandas as pd
import numpy as np
df = pd.read_excel('file.xlsx', usecols = ['A', 'B', 'C'])
print(df)
list1, list2, list3 = df['A'].tolist(), df['B'].tolist(), df['C'].tolist()
print(list1,list2,list3)
for i in list1:
list1[i].append(list2[i]).append(list3[i])
print(list1)
The conversion to 3 lists works. Then I try to append the rows of each list together, without success. Is there a simple way to do that?
Use zip method and then use string.join to get your required output
Ex:
res = []
list1, list2, list3 = df['A'].tolist(), df['B'].tolist(), df['C'].tolist()
for i in zip(list1, list2, list3 ):
val = map(str, i)
res.append("/{0}".format("/".join(val)))
You should use a apply function to create a new column as required and then convert it to a list
import pandas as pd
import numpy as np
df = pd.read_excel('file.xlsx', usecols = ['A', 'B', 'C'])
print(df)
x=df[['A','B','C']].apply(lambda row: '/'+row['A']+'/'+row['B']+'/'+row['C'],axis=1)
x.tolist()
Update shorter code:
x=df[['A','B','C']].apply(lambda row: '/'+'/'.join(row),axis=1)
x.tolist()
Related
I am stuck in a project where I have to seperate all Dictionary item from a list and create a dataframe from that. Below is the json file link.
Link:- https://drive.google.com/file/d/1H76rjDEZweVGzPcziT5Z6zXqzOSmVZQd/view?usp=sharing
I had written this code which coverting the all list item into string. hence I am able to seperate them into a new list. However the collected item is not getting coverted into a dataframe. Your help will be highly appriciated.
read_cont = []
new_list1 = []
new_list2 = []
for i in rjson:
for j in rjson[i]:
read_cont.append(rjson[i][j])
data_filter = read_cont[1]
for item in data_filter:
for j in item:
new_list1.append(item[j])
new_list1 = map(str,new_list1)
for i in new_list1:
if len(i) > 100:
new_list2.append(i)
header_names = ["STRIKE PRICE","EXPIRY","underlying", "identifier","OPENINTEREST","changeinOpenInterest","pchangeinOpenInterest", "totalTradedVolume","impliedVolatility","lastPrice","change","pChange", "totalBuyQuantity","totalSellQuantity","bidQty","bidprice","askQty","askPrice","underlyingValue"]
df = pd.DataFrame(new_list2,columns=header_names)`
It should be looking something like this.........
Columns: [STRIKE PRICE, EXPIRY, underlying, identifier, OPENINTEREST, changeinOpenInterest, pchangeinOpenInterest, totalTradedVolume, impliedVolatility, lastPrice, change, pChange, totalBuyQuantity, totalSellQuantity, bidQty, bidprice, askQty, askPrice, underlyingValue]
Index: []
import json
import pandas as pd
h = json.load(open('scrap.json'))
mdf = pd.DataFrame()
for i in h['records']['data']:
for k in i:
if isinstance(i[k], dict):
df = pd.DataFrame(i[k], index=[0])
mdf = pd.concat([mdf, df])
continue
print(mdf)
Im trying to filter the list1 based on another list2 with the following code:
import csv
with open('screen.csv') as f: #A file with a list of all the article titles
reader = csv.reader(f)
list1 = list(reader)
print(list1)
list2 = ["Knowledge Management", "modeling language"] #key words that article title should have (at least one of them)
list2 = [str(x) for x in list2]
occur = [i for i in list1 for j in list2 if str(j) in i]
print(occur)
but the output is empty.
My list1 looks like this:
list_1 is actually a list of lists, not a list of strings, so you need to flatten it (e.g. by doing this) before trying to compare elements:
list_1 = [['foo bar'], ['baz beep bop']]
list_2 = ['foo', 'bub']
flattened_list_1 = [
element
for sublist in list_1
for element in sublist
]
occurrences = [
phrase
for phrase in flattened_list_1 if any(
word in phrase
for word in list_2
)
]
print(occurrences)
# output:
# ['foo bar']
import pandas as pd
import numpy as np
df = pd.DataFrame(data)
print(df[df.column_of_list.map(lambda x: np.isin(x, another_list).all())])
#OR
print(df[df[0].map(lambda x: np.isin(x, another_list).all())])
Try with real data:
import numpy as np
import pandas as pd
data = ["Knowledge Management", "modeling language"]
another_list=["modeling language","natural language"]
df = pd.DataFrame(data)
a = df[df[0].map(lambda x: np.isin(x, another_list).all())]
print(a)
Your list1 is a list of lists, because the csv.reader that you're using to create it always returns lists for each row, even if there's only a single item. (If you're expecting a single name from each row, I'm not sure why you're using csv here, it's only going to be a hindrance.)
Later when you check if str(j) in i as part of your filtering list comprehension, you're testing if the string j is present in the list i. Since the values in list2 are not full titles but key-phrases, you aren't going to find any matches. If you were checking in the inner strings, you'd get substring checks, but when you test list membership it must be an exact match.
Probably the best way to fix the problem is to do away with the nested lists in list1. Try creating it with:
with open('screen.csv') as f:
list1 = [line.strip() for line in f]
I have two lists, where the first one is a list of strings called names and has been generated by using the name of the corresponding csv files.
names = ['ID1','ID2','ID3']
I have loaded the csv files into individual pandas dataframes and then done some preprocessing which leaves me with a list of lists, where each element is the data of each dataframe:
dfs = [['car','fast','blue'],[],['red','bike','slow']]
As you can see it can happen that after preprocessing a dataframe could be empty, which leads to an empty list in dfs.
I would like to remove the element from this list and return it's index, so far I have tried this but I get no index when printing k.
k = [i for i,x in enumerate(dfs) if not x]
The reason I need this index is, so I can then look at removing the corresponding index element in list names.
The end results would look a bit like this:
names = ['ID1','ID3']
dfs = [['car','fast','blue'],['red','bike','slow']]
This way I can then save each individual dataframe as a csv file:
for df, name in zip(dfs, names):
df.to_csv(name + '_.csv', index=False)
EDIT: I MADE A MISTAKE: The list of lists called dfs needs changing from [''] to []
You can use the built-in any() method:
k = [i for i, x in enumerate(dfs) if not any(x)]
The reason your
k = [i for i, x in enumerate(dfs) if not x]
doesn't work is because, regardless of what is in a list, as long as the list is not empty, the truthy value of the list will be True.
The any() method will take in an array, and return whether any of the elements in the array has a truthy value of True. If the array has no elements such, it will return False. The thruthy value of an empty string, '', is False.
EDIT: The question got edited, here is my updated answer:
You can try creating new lists:
names = ['ID1','ID2','ID3']
dfs = [['car','fast','blue'],[],['red','bike','slow']]
new_names = list()
new_dfs = list()
for i, x in enumerate(dfs):
if x:
new_names.append(names[i])
new_dfs.append(x)
print(new_names)
print(new_dfs)
Output:
['ID1', 'ID3']
[['car', 'fast', 'blue'], ['red', 'bike', 'slow']]
If it doesn't work, try adding a print(x) to the loop to see what is going on:
names = ['ID1','ID2','ID3']
dfs = [['car','fast','blue'],[],['red','bike','slow']]
new_names = list()
new_dfs = list()
for i, x in enumerate(dfs):
print(x)
if x:
new_names.append(names[i])
new_dfs.append(x)
Since you are already using enumerate , you do not have to loop again.
Hope this solves your problem:
names = ['ID1', 'ID2', 'ID3']
dfs = [['car', 'fast', 'blue'], [''], ['red', 'bike', 'slow']]
for index, i in enumerate(dfs):
if len(i) == 1 and '' in i:
del dfs[index]
del names[index]
print(names)
print(dfs)
# Output
# ['ID1', 'ID3']
# [['car', 'fast', 'blue'], ['red', 'bike', 'slow']]
I think The issue is because of [''].
l = ['']
len(l)
Gives output as 1. Hence,
not l
Gives False
If you are sure it will be [''] only, then try
dfs = [['car','fast','blue'],[''],['red','bike','slow']]
k = [i for i,x in enumerate(dfs) if len(x)==1 and x[0]=='']
this gives [1] as output
Or you can try with any(x)
Looking at the data presented, I would do the following:
Step 1: Check if list has any values. If it does, if df will be True.
Step 2: Once you have the list, create a dataframe and write to csv.
The code is as shown below:
names = ['ID1','ID2','ID3']
dfs = [['car','fast','blue'],[],['red','bike','slow']]
dfx = {names[i]:df for i,df in enumerate(dfs) if df)}
import pandas as pd
for name,val in dfx.items():
df = pd.DataFrame({name:val})
df.to_csv(name + '_.csv', index=False)
I am trying to convert a dataframe into list and i have written the following but the output i get is list of list, what should i do to get just the list or how to convert the current output to list.
please check the image attached for the output and code below
import pandas as pd
import mysql.connector
region1 = mysql.connector.connect(host="localhost", user="xxxxxx", passwd="xxxxxxxx")
query1 = "SHOW DATABASES"
df1 = pd.read_sql(query1, region1)
print(df1.values.tolist())
To convert the current nested list to a single flattened list
from nltk import flatten
a = [['a'], ['b'], ['c']]
print(flatten(a))
Output:
['a', 'b', 'c']
This may help you
df1 = pd.read_sql(query1, region1)
res = []
for col in df.columns:
res.append(df[col].values.tolist())
print(res)
You can use a list comprehension:
[i[0] for i in df1.values.tolist()]
Output:
['atest', 'btest', 'ctest', 'information_schema', 'mysql', 'performance_schema', 'sakila', 'sys', 'telusko', 'world']
That is for when each list inside the list has only one element in it.
If there would be multiple things in each list:
[i for j in df1.values.tolist() for i in j]
Consider a column with its unique values:
df['something'].unique() =array(['aa','bb','a','c']).
Now I want to know which of the items start with an a .
My expected answer is
'aa','a'
I think here is the simplest use of list comprehension with filtering:
out = [x for x in df['something'].unique() if x.startswith('a')]
print (out)
['aa', 'a']
For pandas solution use:
s = pd.Series(df['something'].unique())
out = s[s.str.startswith('a')].tolist()
print (out)
['aa', 'a']