DtypeWarning with python - python
The input file looks like this, and the complete file can be found here:
OG0000008 NbL07g11380.1 NbL19g07810.1 NbL19g09170.1 NbL19g19070.1 NbQ01g01670.1 NbQ01g03330.1 NbQ01g04070.1 NbQ01g04670.1 NbQ01g05120.1 NbQ01g05870.1 NbQ01g06940.1 NbQ01g07580.1 NbQ01g08860.1 NbQ01g10050.1 NbQ01g10360.1 NbQ01g14200.1 NbQ01g14790.1 NbQ01g16080.1 NbQ01g17760.1 NbQ01g19270.1 NbQ01g19310.1 NbQ01g19390.1 NbQ01g21260.1 NbQ01g21330.1 NbQ01g21740.1 NbQ01g21910.1 NbQ01g23100.1 NbQ01g24620.1 NbQ01g25340.1 NbQ01g26060.1 NbQ01g26320.1 NbQ02g00750.1 NbQ02g03100.1 NbQ02g03420.1 NbQ02g03610.1 NbQ02g03680.1 NbQ02g05120.1 NbQ02g07460.1 NbQ02g08170.1 NbQ02g08330.1 NbQ02g09220.1 NbQ02g09400.1 NbQ02g10620.1 NbQ02g11310.1 NbQ02g14330.1 NbQ02g14460.1 NbQ02g14520.1 NbQ02g15320.1 NbQ02g17090.1 NbQ02g17130.1 NbQ02g20290.1 NbQ02g23070.1 NbQ02g23420.1 NbQ02g24450.1 NbQ02g24480.1 NbQ02g26700.1 NbQ03g00830.1 NbQ03g01970.1 NbQ03g04460.1 NbQ03g06900.1 NbQ03g09530.1 NbQ03g10620.1 NbQ03g12760.1 NbQ03g13450.1 NbQ03g15540.1 NbQ03g15640.1 NbQ03g17180.1 NbQ03g20740.1 NbQ03g21510.1 NbQ03g24670.1 NbQ04g01350.1 NbQ04g01720.1 NbQ04g08420.1 NbQ04g09090.1 NbQ04g10450.1 NbQ04g11470.1 NbQ04g12120.1 NbQ04g14130.1 NbQ04g15440.1 NbQ04g15860.1 NbQ04g16450.1 NbQ04g16620.1 NbQ04g17760.1 NbQ04g19040.1 NbQ04g20020.1 NbQ05g03320.1 NbQ05g04660.1 NbQ05g05970.1 NbQ05g07500.1 NbQ05g08900.1 NbQ05g09760.1 NbQ05g10830.1 NbQ05g11150.1 NbQ05g11340.1 NbQ05g11510.1 NbQ05g11530.1 NbQ05g11780.1 NbQ05g16980.1 NbQ05g18190.1 NbQ05g21710.1 NbQ05g23400.1 NbQ06g01110.1 NbQ06g01430.1 NbQ06g04200.1 NbQ06g04440.1 NbQ06g05330.1 NbQ06g05770.1 NbQ06g05820.1 NbQ06g06700.1 NbQ06g08620.1 NbQ06g09190.1 NbQ06g10460.1 NbQ06g15220.1 NbQ06g15330.1 NbQ06g15700.1 NbQ06g16320.1 NbQ06g16590.1 NbQ06g17590.1 NbQ06g17670.1 NbQ06g20050.1 NbQ07g01030.1 NbQ07g02010.1 NbQ07g04350.1 NbQ07g04900.1 NbQ07g05610.1 NbQ07g06200.1 NbQ07g07110.1 NbQ07g07690.1 NbQ07g08640.1 NbQ07g10390.1 NbQ07g11920.1 NbQ07g14130.1 NbQ07g15590.1 NbQ07g15620.1 NbQ07g16910.1 NbQ07g17130.1 NbQ07g17950.1 NbQ08g00060.1 NbQ08g02240.1 NbQ08g02300.1 NbQ08g02310.1 NbQ08g03290.1 NbQ08g05330.1 NbQ08g09280.1 NbQ08g14890.1 NbQ08g15820.1 NbQ08g15950.1 NbQ08g19830.1 NbQ08g20150.1 NbQ08g22050.1 NbQ08g22620.1 NbQ09g02100.1 NbQ09g02620.1 NbQ09g03950.1 NbQ09g04200.1 NbQ09g06040.1 NbQ09g06640.1 NbQ09g08160.1 NbQ09g08330.1 NbQ09g09660.1 NbQ09g11220.1 NbQ09g13860.1 NbQ09g15180.1 NbQ09g15310.1 NbQ09g16530.1 NbQ09g17900.1 NbQ09g18100.1 NbQ09g18720.1 NbQ09g19280.1 NbQ09g21840.1 NbQ10g00480.1 NbQ10g01350.1 NbQ10g02870.1 NbQ10g03640.1 NbQ10g03730.1 NbQ10g08070.1 NbQ10g09510.1 NbQ10g11010.1 NbQ10g11760.1 NbQ10g12050.1 NbQ10g12060.1 NbQ10g12910.1 NbQ10g19200.1 NbQ10g19930.1 NbQ10g20390.1 NbQ10g20730.1 NbQ10g21080.1 NbQ10g21140.1 NbQ10g24010.1 NbQ11g00310.1 NbQ11g01210.1 NbQ11g01370.1 NbQ11g04610.1 NbQ11g04800.1 NbQ11g06060.1 NbQ11g07820.1 NbQ11g08390.1 NbQ11g09100.1 NbQ11g09350.1 NbQ11g13660.1 NbQ11g13930.1 NbQ11g16260.1 NbQ11g17360.1 NbQ11g18430.1 NbQ11g21080.1 NbQ11g23280.1 NbQ11g23990.1 NbQ11g25050.1 NbQ12g03770.1 NbQ12g04850.1 NbQ12g07340.1 NbQ12g09080.1 NbQ12g10820.1 NbQ12g12070.1 NbQ12g14750.1 NbQ12g15000.1 NbQ12g15230.1 NbQ12g20380.1 NbQ12g21080.1 NbQ12g21830.1 NbQ12g23960.1 NbQ13g01300.1 NbQ13g02350.1 NbQ13g03860.1 NbQ13g04410.1 NbQ13g08800.1 NbQ13g09850.1 NbQ13g10370.1 NbQ13g11700.1 NbQ13g12420.1 NbQ13g15780.1 NbQ13g16040.1 NbQ13g23160.1 NbQ13g24120.1 NbQ13g24540.1 NbQ13g25080.1 NbQ13g25490.1 NbQ13g28240.1 NbQ13g29770.1 NbQ14g01070.1 NbQ14g03950.1 NbQ14g05360.1 NbQ14g05410.1 NbQ14g06880.1 NbQ14g07270.1 NbQ14g07500.1 NbQ14g10290.1 NbQ14g10770.1 NbQ14g14320.1 NbQ14g17890.1 NbQ14g18710.1 NbQ14g20960.1 NbQ14g22890.1 NbQ15g00150.1 NbQ15g02300.1 NbQ15g02330.1 NbQ15g02350.1 NbQ15g03230.1 NbQ15g06190.1 NbQ15g07120.1 NbQ15g07750.1 NbQ15g09000.1 NbQ15g09050.1 NbQ15g11920.1 NbQ15g12650.1 NbQ15g12840.1 NbQ15g15670.1 NbQ15g15930.1 NbQ15g18670.1 NbQ15g19070.1 NbQ15g20620.1 NbQ15g22880.1 NbQ15g23000.1 NbQ15g26060.1 NbQ16g00880.1 NbQ16g04360.1 NbQ16g06490.1 NbQ16g09100.1 NbQ16g11020.1 NbQ16g11560.1 NbQ16g13810.1 NbQ16g13820.1 NbQ16g17040.1 NbQ16g17130.1 NbQ16g17340.1 NbQ16g18390.1 NbQ16g18430.1 NbQ16g23100.1 NbQ16g23570.1 NbQ16g24270.1 NbQ16g25200.1 NbQ16g25830.1 NbQ16g25880.1 NbQ16g25990.1 NbQ16g26610.1 NbQ16g26660.1 NbQ16g28010.1 NbQ16g28180.1 NbQ17g01150.1 NbQ17g01180.1 NbQ17g01570.1 NbQ17g01950.1 NbQ17g05460.1 NbQ17g05540.1 NbQ17g05980.1 NbQ17g07990.1 NbQ17g08300.1 NbQ17g09330.1 NbQ17g09400.1 NbQ17g10090.1 NbQ17g11220.1 NbQ17g13030.1 NbQ17g15460.1 NbQ17g16690.1 NbQ17g20980.1 NbQ17g22370.1 NbQ17g25040.1 NbQ17g28730.1 NbQ18g02140.1 NbQ18g02740.1 NbQ18g05440.1 NbQ18g06120.1 NbQ18g07470.1 NbQ18g12320.1 NbQ18g12530.1 NbQ18g12850.1 NbQ18g13840.1 NbQ18g14420.1 NbQ18g14930.1 NbQ18g15730.1 NbQ18g17750.1 NbQ18g17850.1 NbQ18g21060.1 NbQ19g01040.1 NbQ19g05480.1 NbQ19g06450.1 NbQ19g06510.1 NbQ19g08330.1 NbQ19g11840.1 NbQ19g11880.1 NbQ19g13750.1 NbQ19g14190.1 NbQ19g14210.1 NbQ19g14920.1 NbQ19g18540.1 NbQ19g19870.1 NbQ19g21020.1 NbQ19g21220.1 NbQ19g22080.1 NbQ19g22800.1 NbQ19g24690.1 NbQ19g24730.1 rna19561
OG0000001 Capann_59V1aChr01g048170.1 NbL01g00020.1 NbL01g00940.1 NbL01g02330.1 NbL01g03550.1 NbL01g03650.1 NbL01g04410.1 NbL01g04920.1 NbL01g06850.1 NbL01g16120.1 NbL01g19150.1 NbL01g20140.1 NbL01g20930.1 NbL01g22230.1 NbL01g24190.1 NbL01g24280.1 NbL01g24300.1 NbL02g00570.1 NbL02g00900.1 NbL02g01270.1 NbL02g02110.1 NbL02g02210.1 NbL02g02470.1 NbL02g03180.1 NbL02g04740.1 NbL02g04750.1 NbL02g06120.1 NbL02g06860.1 NbL02g07280.1 NbL02g07680.1 NbL02g07740.1 NbL02g09780.1 NbL02g11320.1 NbL02g12670.1 NbL02g13080.1 NbL02g14050.1 NbL02g14190.1 NbL02g15010.1 NbL02g15890.1 NbL02g16190.1 NbL02g16730.1 NbL02g17070.1 NbL02g17360.1 NbL02g18820.1 NbL02g19340.1 NbL02g20100.1 NbL02g23950.1 NbL02g24800.1 NbL03g01610.1 NbL03g01680.1 NbL03g01890.1 NbL03g02230.1 NbL03g02600.1 NbL03g03410.1 NbL03g04990.1 NbL03g05400.1 NbL03g08030.1 NbL03g08250.1 NbL03g08690.1 NbL03g10230.1 NbL03g11060.1 NbL03g13030.1 NbL03g14960.1 NbL03g15110.1 NbL03g16690.1 NbL03g16900.1 NbL03g18260.1 NbL03g18950.1 NbL03g21180.1 NbL03g21210.1 NbL03g21530.1 NbL03g22960.1 NbL03g24430.1 NbL04g01140.1 NbL04g01490.1 NbL04g02030.1 NbL04g02560.1 NbL04g03700.1 NbL04g04160.1 NbL04g05240.1 NbL04g05420.1 NbL04g05850.1 NbL04g12420.1 NbL04g12640.1 NbL04g13650.1 NbL04g13780.1 NbL04g14310.1 NbL04g16260.1 NbL04g17750.1 NbL04g18380.1 NbL04g18870.1 NbL04g19030.1 NbL04g19630.1 NbL05g00320.1 NbL05g03060.1 NbL05g03300.1 NbL05g04060.1 NbL05g07620.1 NbL05g08630.1 NbL05g09580.1 NbL05g10060.1 NbL05g11400.1 NbL05g12280.1 NbL05g13170.1 NbL05g16020.1 NbL05g17530.1 NbL05g17730.1 NbL05g18340.1 NbL05g18590.1 NbL05g18600.1 NbL05g19640.1 NbL05g20640.1 NbL05g21000.1 NbL05g21640.1 NbL05g22610.1 NbL06g00640.1 NbL06g00660.1 NbL06g02210.1 NbL06g03150.1 NbL06g03680.1 NbL06g04910.1 NbL06g07950.1 NbL06g09970.1 NbL06g11480.1 NbL06g12220.1 NbL06g12400.1 NbL06g12460.1 NbL06g12850.1 NbL06g13120.1 NbL06g14450.1 NbL06g14780.1 NbL06g16990.1 NbL06g17200.1 NbL06g17760.1 NbL06g20380.1 NbL07g02100.1 NbL07g02540.1 NbL07g02970.1 NbL07g03110.1 NbL07g04840.1 NbL07g05350.1 NbL07g06580.1 NbL07g07530.1 NbL07g08450.1 NbL07g09380.1 NbL07g09870.1 NbL07g10730.1 NbL07g10850.1 NbL07g11080.1 NbL07g12450.1 NbL07g12710.1 NbL07g13110.1 NbL07g13920.1 NbL07g14240.1 NbL07g15520.1 NbL07g16220.1 NbL07g17480.1 NbL08g01820.1 NbL08g02750.1 NbL08g02930.1 NbL08g03510.1 NbL08g03620.1 NbL08g03850.1 NbL08g03970.1 NbL08g04040.1 NbL08g06150.1 NbL08g06410.1 NbL08g06680.1 NbL08g06730.1 NbL08g07620.1 NbL08g08450.1 NbL08g08640.1 NbL08g09910.1 NbL08g10160.1 NbL08g11760.1 NbL08g12570.1 NbL08g13630.1 NbL08g13890.1 NbL08g15050.1 NbL08g15340.1 NbL08g18010.1 NbL08g18420.1 NbL08g19080.1 NbL08g19190.1 NbL09g00900.1 NbL09g02160.1 NbL09g02330.1 NbL09g02470.1 NbL09g04150.1 NbL09g05210.1 NbL09g07010.1 NbL09g09070.1 NbL09g10290.1 NbL09g10500.1 NbL09g11220.1 NbL09g13490.1 NbL09g15290.1 NbL09g15830.1 NbL09g17240.1 NbL09g19250.1 NbL09g19460.1 NbL09g20190.1 NbL09g21040.1 NbL09g21520.1 NbL09g23460.1 NbL10g00080.1 NbL10g03710.1 NbL10g04330.1 NbL10g04560.1 NbL10g05200.1 NbL10g06320.1 NbL10g07510.1 NbL10g07960.1 NbL10g08670.1 NbL10g08970.1 NbL10g11120.1 NbL10g11340.1 NbL10g11820.1 NbL10g13720.1 NbL10g14560.1 NbL10g14770.1 NbL10g16430.1 NbL10g18140.1 NbL10g18380.1 NbL10g19280.1 NbL10g19690.1 NbL10g21210.1 NbL10g22680.1 NbL10g23160.1 NbL10g23560.1 NbL10g24210.1 NbL11g00680.1 NbL11g00970.1 NbL11g01230.1 NbL11g01270.1 NbL11g01520.1 NbL11g01530.1 NbL11g02920.1 NbL11g03540.1 NbL11g03990.1 NbL11g05630.1 NbL11g08950.1 NbL11g08980.1 NbL11g09510.1 NbL11g10840.1 NbL11g11030.1 NbL11g11230.1 NbL11g12430.1 NbL11g13300.1 NbL11g15430.1 NbL11g16390.1 NbL11g16410.1 NbL11g17320.1 NbL11g18090.1 NbL11g21310.1 NbL11g21470.1 NbL11g21780.1 NbL11g21820.1 NbL11g22270.1 NbL11g22310.1 NbL11g23180.1 NbL11g24100.1 NbL12g00130.1 NbL12g01810.1 NbL12g02230.1 NbL12g02720.1 NbL12g02760.1 NbL12g04120.1 NbL12g04550.1 NbL12g06630.1 NbL12g07830.1 NbL12g09170.1 NbL12g10580.1 NbL12g12090.1 NbL12g12490.1 NbL12g12630.1 NbL12g12800.1 NbL12g13320.1 NbL12g13460.1 NbL12g14430.1 NbL12g14970.1 NbL12g15490.1 NbL12g17460.1 NbL12g18190.1 NbL12g18590.1 NbL12g19900.1 NbL12g20690.1 NbL12g22040.1 NbL12g22560.1 NbL13g00350.1 NbL13g01440.1 NbL13g02400.1 NbL13g03210.1 NbL13g03360.1 NbL13g04070.1 NbL13g05250.1 NbL13g08460.1 NbL13g09010.1 NbL13g09140.1 NbL13g10290.1 NbL13g11570.1 NbL13g13370.1 NbL13g14910.1 NbL13g18680.1 NbL13g19510.1 NbL13g23520.1 NbL13g24010.1 NbL13g24190.1 NbL13g24460.1 NbL13g26310.1 NbL13g26640.1 NbL13g26860.1 NbL13g27260.1 NbL13g27960.1 NbL14g02460.1 NbL14g02750.1 NbL14g08750.1 NbL14g08910.1 NbL14g09120.1 NbL14g09540.1 NbL14g09920.1 NbL14g11070.1 NbL14g11150.1 NbL14g12570.1 NbL14g14530.1 NbL14g14860.1 NbL14g15240.1 NbL14g15460.1 NbL14g16620.1 NbL14g16910.1 NbL14g17940.1 NbL14g21150.1 NbL14g21750.1 NbL14g21910.1 NbL15g00790.1 NbL15g01170.1 NbL15g02310.1 NbL15g04220.1 NbL15g05970.1 NbL15g06340.1 NbL15g06440.1 NbL15g07020.1 NbL15g07370.1 NbL15g07470.1 NbL15g09010.1 NbL15g13210.1 NbL15g14550.1 NbL15g14600.1 NbL15g17290.1 NbL15g18170.1 NbL15g19710.1 NbL15g21840.1 NbL15g21930.1 NbL15g23410.1 NbL15g23420.1 NbL15g23430.1 NbL15g25130.1 NbL15g25200.1 NbL16g01760.1 NbL16g02140.1 NbL16g04460.1 NbL16g05010.1 NbL16g05020.1 NbL16g06780.1 NbL16g07540.1 NbL16g07980.1 NbL16g09760.1 NbL16g10610.1 NbL16g12320.1 NbL16g13510.1 NbL16g14420.1 NbL16g15690.1 NbL16g17420.1 NbL16g17790.1 NbL16g17880.1 NbL16g18730.1 NbL16g18940.1 NbL16g19440.1 NbL16g20980.1 NbL16g23180.1 NbL16g23610.1 NbL16g23660.1 NbL16g23910.1 NbL16g24550.1 NbL16g24640.1 NbL16g25300.1 NbL16g25630.1 NbL16g26710.1 NbL17g01590.1 NbL17g02070.1 NbL17g02120.1 NbL17g02920.1 NbL17g03040.1 NbL17g03540.1 NbL17g03700.1 NbL17g03800.1 NbL17g05400.1 NbL17g07510.1 NbL17g08450.1 NbL17g08930.1 NbL17g10090.1 NbL17g14370.1 NbL17g14600.1 NbL17g15390.1 NbL17g15900.1 NbL17g16000.1 NbL17g16910.1 NbL17g17480.1 NbL17g18240.1 NbL17g20020.1 NbL17g20830.1 NbL17g21220.1 NbL17g21690.1 NbL17g25960.1 NbL18g00030.1 NbL18g00150.1 NbL18g00310.1 NbL18g00670.1 NbL18g00700.1 NbL18g01630.1 NbL18g02650.1 NbL18g04460.1 NbL18g05210.1 NbL18g05690.1 NbL18g07270.1 NbL18g07440.1 NbL18g07500.1 NbL18g09090.1 NbL18g09810.1 NbL18g09880.1 NbL18g10500.1 NbL18g10990.1 NbL18g12070.1 NbL18g13060.1 NbL18g17480.1 NbL19g00230.1 NbL19g04470.1 NbL19g04700.1 NbL19g07770.1 NbL19g07870.1 NbL19g09260.1 NbL19g09870.1 NbL19g13480.1 NbL19g14360.1 NbL19g14400.1 NbL19g14720.1 NbL19g17700.1 NbL19g18860.1 NbL19g22750.1 NbQ13g00370.1 NbQ14g10310.1 NbQ17g06050.1
This script appears to have some problems with the file:
import pandas as pd
orthofinder_output = "../OrthoFinder-res/Results_Jan23/Orthogroups/Orthogroups-fixed.txt"
orthogroups = pd.read_csv(orthofinder_output, sep=' ', header=None)
# Extract gene family information
expansions = {}
contractions = {}
for i, row in orthogroups.iterrows():
if len(row) > 2:
expansions[row[0]] = row[2:]
else:
contractions[row[0]] = row[1]
# Print expansions and contractions
print("Expansions:", expansions)
print("Contractions:", contractions)
I got the following error:
geneFamilyExpansionsContractions.py:20: DtypeWarning: Columns (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600,601,602,603,604,605,606,607,608,609,610,611,612,613,614,615,616,617,618,619,620,621,622,623,624,625,626,627,628,629,630,631,632,633,634,635,636,637,638,639,640,641,642,643,644,645,646,647,648,649,650,651,652,653,654,655,656,657,658,659,660,661,662,663,664,665,666,667,668,669,670,671,672,673,674,675,676,677,678,679,680,681,682,683,684,685,686,687,688,689,690,691,692,693,694,695,696,697,698,699,700,701,702,703,704,705,706,707,708,709,710,711,712,713,714,715,716,717,718,719,720,721,722,723,724,725,726,727,728,729,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,746,747,748,749,750,751,752,753,754,755,756,757,758,759,760,761,762,763,764,765,766,767,768,769,770,771,772,773,774,775,776,777,778,779,780,781,782,783,784,785,786,787,788,789,790,791,792,793,794,795,796,797,798,799,800,801,802,803,804,805,806,807,808,809,810,811,812,813,814,815,816,817,818,819,820,821,822,823,824,825,826,827,828,829,830,831,832,833,834,835,836,837,838,839,840,841,842,843,844,845,846,847,848,849,850,851,852,853,854,855,856,857,858,859,860,861,862,863,864,865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880,881,882,883,884,885,886,887,888,889,890,891,892,893,894,895,896,897,898,899,900,901,902,903,904,905,906,907,908,909,910,911,912,913,914,915,916,917,918,919,920,921,922,923,924,925,926,927,928,929,930,931,932,933,934,935,936,937,938,939,940,941,942,943,944,945,946,947,948,949,950,951,952,953,954,955,956,957,958,959,960,961,962,963,964,965,966,967,968) have mixed types. Specify dtype option on import or set low_memory=False.
orthogroups = pd.read_csv(orthofinder_output, sep=' ', header=None)
...
Expansions:
...
Name: 63241, Length: 967, dtype: object, 'OG0063242': 2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
...
964 NaN
965 NaN
966 NaN
967 NaN
968 NaN
Name: 63242, Length: 967, dtype: object}
Contractions: {}
How can it be fixed?
Related
python not recognizing pandas_ta module
import requests import pandas as pd import pandas_ta as ta def stochFourMonitor(): k_period = 14 d_period = 3 data = get_data('BTC-PERP',14400,1642935495,1643165895) print(data) data = data['result'] df = pd.DataFrame(data) df['trailingHigh'] = df['high'].rolling(k_period).max() df['trailingLow'] = df['low'].rolling(k_period).min() df['%K'] = (df['close'] - df['trailingLow']) * 100 / (df['trailingHigh'] - df['trailingLow']) df['%D'] = df['%K'].rolling(d_period).mean() df.index.name = 'test' df.set_index(pd.DatetimeIndex(df["startTime"]), inplace=True) print(df) df.drop(columns=['startTime']) print(df) df.ta.stoch(high='High', low='Low',close= 'Close', k=14, d=3, append=True) #t = ta.stoch(close='close',high='high', low='low', k=14, d=3, append=True) #df.ta.stoch(close='close',high='high', low='low', k=14, d=3, append=True) def get_data(marketName,resolution,start_time,end_time): data = requests.get('https://ftx.com/api/markets/' + marketName + '/candles?resolution=' + str(resolution) + '&start_time=' + str(start_time) + '&end_time=' + str(end_time)).json() return data I keep receiving the error 'NoneType' object has no attribute 'name'. See below for full exception. It seems like the code is not recognizing the pandas_ta module but I don't understand why. Any help troubleshooting would be much appreciated. Exception has occurred: AttributeError (note: full exception trace is shown but execution is paused at: ) 'NoneType' object has no attribute 'name' File "C:\Users\Jason\Documents\TradingCode\FTX Websocket\testing21.py", line 21, in stochFourMonitor df.ta.stoch(high='High', low='Low',close= 'Close', k=14, d=3, append=True) File "C:\Users\Jason\Documents\TradingCode\FTX Websocket\testing21.py", line 31, in (Current frame) stochFourMonitor()
You have to few values in your dataframe. You need at least 17 values (k=14, d=3) >>> pd.Timestamp(1642935495, unit='s') Timestamp('2022-01-23 10:58:15') >>> pd.Timestamp(1643165895, unit='s') Timestamp('2022-01-26 02:58:15') >>> pd.DataFrame(get_data('BTC-PERP',14400,1642935495,1643165895)['result']) 0 2022-01-23T12:00:00+00:00 1.642939e+12 35690.0 36082.0 35000.0 35306.0 6.315513e+08 1 2022-01-23T16:00:00+00:00 1.642954e+12 35306.0 35460.0 34601.0 34785.0 7.246238e+08 2 2022-01-23T20:00:00+00:00 1.642968e+12 34785.0 36551.0 34712.0 36271.0 9.663773e+08 3 2022-01-24T00:00:00+00:00 1.642982e+12 36271.0 36283.0 35148.0 35351.0 6.007333e+08 4 2022-01-24T04:00:00+00:00 1.642997e+12 35351.0 35511.0 34821.0 34896.0 5.554126e+08 5 2022-01-24T08:00:00+00:00 1.643011e+12 34895.0 35610.0 33033.0 33709.0 1.676436e+09 6 2022-01-24T12:00:00+00:00 1.643026e+12 33709.0 34399.0 32837.0 34260.0 2.021096e+09 7 2022-01-24T16:00:00+00:00 1.643040e+12 34261.0 36493.0 33800.0 36101.0 1.989552e+09 8 2022-01-24T20:00:00+00:00 1.643054e+12 36101.0 37596.0 35990.0 36673.0 1.202684e+09 9 2022-01-25T00:00:00+00:00 1.643069e+12 36673.0 36702.0 35974.0 36431.0 4.538093e+08 10 2022-01-25T04:00:00+00:00 1.643083e+12 36431.0 36500.0 35719.0 36067.0 3.514587e+08 11 2022-01-25T08:00:00+00:00 1.643098e+12 36067.0 36824.0 36030.0 36431.0 5.830712e+08 12 2022-01-25T12:00:00+00:00 1.643112e+12 36431.0 37200.0 35997.0 36568.0 9.992247e+08 13 2022-01-25T16:00:00+00:00 1.643126e+12 36568.0 37600.0 36532.0 37079.0 8.225219e+08 14 2022-01-25T20:00:00+00:00 1.643141e+12 37077.0 37140.0 36437.0 36980.0 7.892745e+08 15 2022-01-26T00:00:00+00:00 1.643155e+12 36980.0 37242.0 36567.0 37238.0 3.226400e+08 >>> pd.DataFrame(get_data('BTC-PERP',14400,1642935495,1643165895)['result']) ... AttributeError: 'NoneType' object has no attribute 'name' Now change 1642935495 ('2022-01-23 10:58:15') by 1642845495 ('2022-01-22 10:58:15': >>> pd.DataFrame(get_data('BTC-PERP',14400,1642845495,1643165895)['result']).ta.stoch() STOCHk_14_3_3 STOCHd_14_3_3 13 NaN NaN 14 NaN NaN 15 80.824814 NaN 16 74.665546 NaN 17 72.970512 76.153624 18 73.930097 73.855385 19 80.993469 75.964693 20 84.814444 79.912670 21 89.775352 85.194422
Save geocoding results from address to longitude and latitude to original dataframe in Python
Given a small dataset df as follows: id name address 0 1 ABC tower 北京市朝阳区 1 2 AC park 北京市海淀区 2 3 ZR hospital 上海市黄浦区 3 4 Fengtai library NaN 4 5 Square Point 上海市虹口区 I would like to obtain longitude and latidude for address column and append them to orginal dataframe. Please note there are NaNs in address column. The code below gives me a table with addresses, longitude and latitude, but it ignores the NaN address rows, also the code should be improved: import pandas as pd import requests import json df = df[df['address'].notna()] res = [] for addre in df['address']: url = "http://restapi.amap.com/v3/geocode/geo?key=f057101329c0200f170be166d9b023a1&address=" + addre dat = { 'count': "1", } r = requests.post(url, data = json.dumps(dat)) s = r.json() infos = s['geocodes'] for j in range(0, 10000): # print(j) try: more_infos = infos[j] # print(more_infos) except: continue try: data = more_infos['location'] # print(data) except: continue try: lon_lat = data.split(',') lon = float(lon_lat[0]) lat = float(lon_lat[1]) except: continue res.append([addre, lon, lat]) result = pd.DataFrame(res) result.columns = ['address', 'longitude', 'latitude'] print(result) result.to_excel('result.xlsx', index = False) Out: address longitude latitude 0 北京市朝阳区 116.601144 39.948574 1 北京市海淀区 116.329519 39.972134 2 上海市黄浦区 121.469240 31.229860 3 上海市虹口区 121.505133 31.264600 But how could I get the final result as follows? Thanks for your kind help at advance. id name address longitude latitude 0 1 ABC tower 北京市朝阳区 116.601144 39.948574 1 2 AC park 北京市海淀区 116.329519 39.972134 2 3 ZR hospital 上海市黄浦区 121.469240 31.229860 3 4 Fengtai library NaN NaN NaN 4 5 Square Point 上海市虹口区 121.505133 31.264600
use pd.merge, as result is the longitude & latitude dataframe. dfn = pd.merge(df, result, on='address', how='left') or for _, row in df.iterrows(): _id = row['id'] name = row['name'] addre = row['address'] if pd.isna(row['address']): res.append([_id, name, addre, None, None]) continue ###### same code ###### url = '...' # ... ###### same code ###### res.append([_id, name, addre, lon, lat]) result = pd.DataFrame(res) result.columns = ['id', 'name', 'address', 'longitude', 'latitude'] print(result) result.to_excel('result.xlsx', index = False)
How to reshape data in Python?
I have a data set as given below- Timestamp = 22-05-2019 08:40 :Light = 64.00 :Temp_Soil = 20.5625 :Temp_Air = 23.1875 :Soil_Moisture_1 = 756 :Soil_Moisture_2 = 780 :Soil_Moisture_3 = 1002 Timestamp = 22-05-2019 08:42 :Light = 64.00 :Temp_Soil = 20.5625 :Temp_Air = 23.125 :Soil_Moisture_1 = 755 :Soil_Moisture_2 = 782 :Soil_Moisture_3 = 1002 And I want to Reshape(rearrange) the dataset to orient header columns like [Timestamp, Light, Temp_Soil, Temp_Air, Soil_Moisture_1, Soil_Moisture_2, Soil_Moisture_3] and their values as the row entry in Python.
One of possible solutions: Instead of a "true" input file, I used a string: inp="""Timestamp = 22-05-2019 08:40 :Light = 64.00 :TempSoil = 20.5625 :TempAir = 23.1875 :SoilMoist1 = 756 :SoilMoist2 = 780 :SoilMoist3 = 1002 Timestamp = 22-05-2019 08:42 :Light = 64.00 :TempSoil = 20.5625 :TempAir = 23.125 :SoilMoist1 = 755 :SoilMoist2 = 782 :SoilMoist3 = 1002""" buf = pd.compat.StringIO(inp) To avoid "folding" of output lines, I shortened field names. Then let's create the result DataFrame and a list of "rows" to append to it. For now - both of them are empty. df = pd.DataFrame(columns=['Timestamp', 'Light', 'TempSoil', 'TempAir', 'SoilMoist1', 'SoilMoist2', 'SoilMoist3']) src = [] Below is a loop processing input rows: while True: line = buf.readline() if not(line): # EOF break lst = re.split(r' :', line.rstrip()) # Field list if len(lst) < 2: # Skip empty source lines continue dct = {} # Source "row" (dictionary) for elem in lst: # Process fields k, v = re.split(r' = ', elem) dct[k] = v # Add field : value to "row" src.append(dct) And the last step is to append rows from src to df : df = df.append(src, ignore_index =True, sort=False) When you print(df), for my test data, you will get: Timestamp Light TempSoil TempAir SoilMoist1 SoilMoist2 SoilMoist3 0 22-05-2019 08:40 64.00 20.5625 23.1875 756 780 1002 1 22-05-2019 08:42 64.00 20.5625 23.125 755 782 1002 For now all columns are of string type, so you can change the required columns to either float or int: df.Light = pd.to_numeric(df.Light) df.TempSoil = pd.to_numeric(df.TempSoil) df.TempAir = pd.to_numeric(df.TempAir) df.SoilMoist1 = pd.to_numeric(df.SoilMoist1) df.SoilMoist2 = pd.to_numeric(df.SoilMoist2) df.SoilMoist3 = pd.to_numeric(df.SoilMoist3) Note that to_numeric() function is clever enough to recognize the possible type to convert to, so first 3 columns changed their type to float64 and the next 3 to int64. You can check it executing df.info(). One more possible conversion is to change Timestamp column to DateTime type: df.Timestamp = pd.to_datetime(df.Timestamp)
Handling exceptions with df.apply
I am using the tld python library to grab the first level domain from the proxy request logs using a apply function. When I run into a strange request that tld doesnt know how to handle like 'http:1 CON' or 'http:/login.cgi%00' I run into an error message like this: TldBadUrl: Is not a valid URL http:1 con! TldBadUrlTraceback (most recent call last) in engine ----> 1 new_fld_column = request_2['request'].apply(get_fld) /usr/local/lib/python2.7/site-packages/pandas/core/series.pyc in apply(self, func, convert_dtype, args, **kwds) 2353 else: 2354 values = self.asobject -> 2355 mapped = lib.map_infer(values, f, convert=convert_dtype) 2356 2357 if len(mapped) and isinstance(mapped[0], Series): pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer (pandas/_libs/lib.c:66440)() /home/cdsw/.local/lib/python2.7/site-packages/tld/utils.pyc in get_fld(url, fail_silently, fix_protocol, search_public, search_private, **kwargs) 385 fix_protocol=fix_protocol, 386 search_public=search_public, --> 387 search_private=search_private 388 ) 389 /home/cdsw/.local/lib/python2.7/site-packages/tld/utils.pyc in process_url(url, fail_silently, fix_protocol, search_public, search_private) 289 return None, None, parsed_url 290 else: --> 291 raise TldBadUrl(url=url) 292 293 domain_parts = domain_name.split('.') In the mean time I have been weeding these out by using many lines like following code but there are hundreds or thousands of them in this dataset: request_2 = request_1[request_1['request'] != 'http:1 CON'] request_2 = request_1[request_1['request'] != 'http:/login.cgi%00'] Dataframe: request request_url count 0 https://login.microsoftonline.com 24521 1 https://dt.adsafeprotected.com 11521 2 https://googleads.g.doubleclick.net 6252 3 https://fls-na.amazon.com 65225 4 https://v10.vortex-win.data.microsoft.com 7852222 5 https://ib.adnxs.com 12 The code: from tld import get_tld from tld import get_fld from impala.dbapi import connect from impala.util import as_pandas import pandas as pd import numpy as np request = pd.read_csv('Proxy/Proxy_Analytics/Request_Grouped_By_Request_Count_12032018.csv') #Remove rows where there were null values in the request column request = request[pd.notnull(request['request'])] #Reset index request.reset_index(drop=True) #Find the urls that contain IP addresses and exclude them from the new dataframe request_1 = request[~request.request.str.findall(r'[0-9]+(?:\.[0-9]+){3}').astype(bool)] #Reset index request_1 = request_1.reset_index(drop=True) #Appply the get_fld lib on the request column new_fld_column = request_2['request'].apply(get_fld) Is there anyway to keep this error from firing and instead add those that would error to a separate dataframe?
If you can wrap your function around a try-except clause, you can determine what rows error out by querying those rows with NaN: import tld from tld import get_fld def try_get_fld(x): try: return get_fld(x) except tld.exceptions.TldBadUrl: return np.nan print(df) request_url count 0 https://login.microsoftonline.com 24521 1 https://dt.adsafeprotected.com 11521 2 https://googleads.g.doubleclick.net 6252 3 https://fls-na.amazon.com 65225 4 https://v10.vortex-win.data.microsoft.com 7852222 5 https://ib.adnxs.com 12 6 http:1 CON 10 7 http:/login.cgi%00 200 df['flds'] = df['request_url'].apply(try_get_fld) print(df['flds']) 0 microsoftonline.com 1 adsafeprotected.com 2 doubleclick.net 3 amazon.com 4 microsoft.com 5 adnxs.com 6 NaN 7 NaN Name: flds, dtype: object faulty_url_df = df[df['flds'].isna()] print(faulty_url_df) request_url count flds 6 http:1 CON 10 NaN 7 http:/login.cgi%00 200 NaN
Searching ID from a text file
I am trying to search for a student ID from a text file and display the line if an ID is found. Here is the code: sid = input ('\nPlease enter the student ID you want to search: ' ) found = False for line in student_file: line = line.rstrip() if sid == line[0]: found = True print (line) print('\n') if found == False: print ("No student record under this ID.") The text file contains the student ID, name and marks of different subjects 1235 abc 0.0 0.0 0.0 0.0 0.0 1111 def 19.0 20.0 30.0 20.3 12.3 1 ghi 100.0 100.0 100.0 100.0 100.0 5 jkl 100.0 100.0 100.0 100.0 100.0 Here if input sid = 1 then it shows the details of the students with IDs 1235,1111,1 input is 1235, then it is displaying "no student record under this ID" input is 5, then it shows the student details for ID=5 All I am trying to do is display the Student record for matched Id. I don't know where am going wrong.
Instead of using line[0] which is the first character you need to check the first word of line. This is because sid can be multiple characters. You can do this by splitting the string at the first space and then selecting the first segment using [0]; if (line.split(" ")[0] == sid): Optionally, you could do; if (sid in line.split(" ")):