DtypeWarning with python - python

The input file looks like this, and the complete file can be found here:
OG0000008 NbL07g11380.1 NbL19g07810.1 NbL19g09170.1 NbL19g19070.1 NbQ01g01670.1 NbQ01g03330.1 NbQ01g04070.1 NbQ01g04670.1 NbQ01g05120.1 NbQ01g05870.1 NbQ01g06940.1 NbQ01g07580.1 NbQ01g08860.1 NbQ01g10050.1 NbQ01g10360.1 NbQ01g14200.1 NbQ01g14790.1 NbQ01g16080.1 NbQ01g17760.1 NbQ01g19270.1 NbQ01g19310.1 NbQ01g19390.1 NbQ01g21260.1 NbQ01g21330.1 NbQ01g21740.1 NbQ01g21910.1 NbQ01g23100.1 NbQ01g24620.1 NbQ01g25340.1 NbQ01g26060.1 NbQ01g26320.1 NbQ02g00750.1 NbQ02g03100.1 NbQ02g03420.1 NbQ02g03610.1 NbQ02g03680.1 NbQ02g05120.1 NbQ02g07460.1 NbQ02g08170.1 NbQ02g08330.1 NbQ02g09220.1 NbQ02g09400.1 NbQ02g10620.1 NbQ02g11310.1 NbQ02g14330.1 NbQ02g14460.1 NbQ02g14520.1 NbQ02g15320.1 NbQ02g17090.1 NbQ02g17130.1 NbQ02g20290.1 NbQ02g23070.1 NbQ02g23420.1 NbQ02g24450.1 NbQ02g24480.1 NbQ02g26700.1 NbQ03g00830.1 NbQ03g01970.1 NbQ03g04460.1 NbQ03g06900.1 NbQ03g09530.1 NbQ03g10620.1 NbQ03g12760.1 NbQ03g13450.1 NbQ03g15540.1 NbQ03g15640.1 NbQ03g17180.1 NbQ03g20740.1 NbQ03g21510.1 NbQ03g24670.1 NbQ04g01350.1 NbQ04g01720.1 NbQ04g08420.1 NbQ04g09090.1 NbQ04g10450.1 NbQ04g11470.1 NbQ04g12120.1 NbQ04g14130.1 NbQ04g15440.1 NbQ04g15860.1 NbQ04g16450.1 NbQ04g16620.1 NbQ04g17760.1 NbQ04g19040.1 NbQ04g20020.1 NbQ05g03320.1 NbQ05g04660.1 NbQ05g05970.1 NbQ05g07500.1 NbQ05g08900.1 NbQ05g09760.1 NbQ05g10830.1 NbQ05g11150.1 NbQ05g11340.1 NbQ05g11510.1 NbQ05g11530.1 NbQ05g11780.1 NbQ05g16980.1 NbQ05g18190.1 NbQ05g21710.1 NbQ05g23400.1 NbQ06g01110.1 NbQ06g01430.1 NbQ06g04200.1 NbQ06g04440.1 NbQ06g05330.1 NbQ06g05770.1 NbQ06g05820.1 NbQ06g06700.1 NbQ06g08620.1 NbQ06g09190.1 NbQ06g10460.1 NbQ06g15220.1 NbQ06g15330.1 NbQ06g15700.1 NbQ06g16320.1 NbQ06g16590.1 NbQ06g17590.1 NbQ06g17670.1 NbQ06g20050.1 NbQ07g01030.1 NbQ07g02010.1 NbQ07g04350.1 NbQ07g04900.1 NbQ07g05610.1 NbQ07g06200.1 NbQ07g07110.1 NbQ07g07690.1 NbQ07g08640.1 NbQ07g10390.1 NbQ07g11920.1 NbQ07g14130.1 NbQ07g15590.1 NbQ07g15620.1 NbQ07g16910.1 NbQ07g17130.1 NbQ07g17950.1 NbQ08g00060.1 NbQ08g02240.1 NbQ08g02300.1 NbQ08g02310.1 NbQ08g03290.1 NbQ08g05330.1 NbQ08g09280.1 NbQ08g14890.1 NbQ08g15820.1 NbQ08g15950.1 NbQ08g19830.1 NbQ08g20150.1 NbQ08g22050.1 NbQ08g22620.1 NbQ09g02100.1 NbQ09g02620.1 NbQ09g03950.1 NbQ09g04200.1 NbQ09g06040.1 NbQ09g06640.1 NbQ09g08160.1 NbQ09g08330.1 NbQ09g09660.1 NbQ09g11220.1 NbQ09g13860.1 NbQ09g15180.1 NbQ09g15310.1 NbQ09g16530.1 NbQ09g17900.1 NbQ09g18100.1 NbQ09g18720.1 NbQ09g19280.1 NbQ09g21840.1 NbQ10g00480.1 NbQ10g01350.1 NbQ10g02870.1 NbQ10g03640.1 NbQ10g03730.1 NbQ10g08070.1 NbQ10g09510.1 NbQ10g11010.1 NbQ10g11760.1 NbQ10g12050.1 NbQ10g12060.1 NbQ10g12910.1 NbQ10g19200.1 NbQ10g19930.1 NbQ10g20390.1 NbQ10g20730.1 NbQ10g21080.1 NbQ10g21140.1 NbQ10g24010.1 NbQ11g00310.1 NbQ11g01210.1 NbQ11g01370.1 NbQ11g04610.1 NbQ11g04800.1 NbQ11g06060.1 NbQ11g07820.1 NbQ11g08390.1 NbQ11g09100.1 NbQ11g09350.1 NbQ11g13660.1 NbQ11g13930.1 NbQ11g16260.1 NbQ11g17360.1 NbQ11g18430.1 NbQ11g21080.1 NbQ11g23280.1 NbQ11g23990.1 NbQ11g25050.1 NbQ12g03770.1 NbQ12g04850.1 NbQ12g07340.1 NbQ12g09080.1 NbQ12g10820.1 NbQ12g12070.1 NbQ12g14750.1 NbQ12g15000.1 NbQ12g15230.1 NbQ12g20380.1 NbQ12g21080.1 NbQ12g21830.1 NbQ12g23960.1 NbQ13g01300.1 NbQ13g02350.1 NbQ13g03860.1 NbQ13g04410.1 NbQ13g08800.1 NbQ13g09850.1 NbQ13g10370.1 NbQ13g11700.1 NbQ13g12420.1 NbQ13g15780.1 NbQ13g16040.1 NbQ13g23160.1 NbQ13g24120.1 NbQ13g24540.1 NbQ13g25080.1 NbQ13g25490.1 NbQ13g28240.1 NbQ13g29770.1 NbQ14g01070.1 NbQ14g03950.1 NbQ14g05360.1 NbQ14g05410.1 NbQ14g06880.1 NbQ14g07270.1 NbQ14g07500.1 NbQ14g10290.1 NbQ14g10770.1 NbQ14g14320.1 NbQ14g17890.1 NbQ14g18710.1 NbQ14g20960.1 NbQ14g22890.1 NbQ15g00150.1 NbQ15g02300.1 NbQ15g02330.1 NbQ15g02350.1 NbQ15g03230.1 NbQ15g06190.1 NbQ15g07120.1 NbQ15g07750.1 NbQ15g09000.1 NbQ15g09050.1 NbQ15g11920.1 NbQ15g12650.1 NbQ15g12840.1 NbQ15g15670.1 NbQ15g15930.1 NbQ15g18670.1 NbQ15g19070.1 NbQ15g20620.1 NbQ15g22880.1 NbQ15g23000.1 NbQ15g26060.1 NbQ16g00880.1 NbQ16g04360.1 NbQ16g06490.1 NbQ16g09100.1 NbQ16g11020.1 NbQ16g11560.1 NbQ16g13810.1 NbQ16g13820.1 NbQ16g17040.1 NbQ16g17130.1 NbQ16g17340.1 NbQ16g18390.1 NbQ16g18430.1 NbQ16g23100.1 NbQ16g23570.1 NbQ16g24270.1 NbQ16g25200.1 NbQ16g25830.1 NbQ16g25880.1 NbQ16g25990.1 NbQ16g26610.1 NbQ16g26660.1 NbQ16g28010.1 NbQ16g28180.1 NbQ17g01150.1 NbQ17g01180.1 NbQ17g01570.1 NbQ17g01950.1 NbQ17g05460.1 NbQ17g05540.1 NbQ17g05980.1 NbQ17g07990.1 NbQ17g08300.1 NbQ17g09330.1 NbQ17g09400.1 NbQ17g10090.1 NbQ17g11220.1 NbQ17g13030.1 NbQ17g15460.1 NbQ17g16690.1 NbQ17g20980.1 NbQ17g22370.1 NbQ17g25040.1 NbQ17g28730.1 NbQ18g02140.1 NbQ18g02740.1 NbQ18g05440.1 NbQ18g06120.1 NbQ18g07470.1 NbQ18g12320.1 NbQ18g12530.1 NbQ18g12850.1 NbQ18g13840.1 NbQ18g14420.1 NbQ18g14930.1 NbQ18g15730.1 NbQ18g17750.1 NbQ18g17850.1 NbQ18g21060.1 NbQ19g01040.1 NbQ19g05480.1 NbQ19g06450.1 NbQ19g06510.1 NbQ19g08330.1 NbQ19g11840.1 NbQ19g11880.1 NbQ19g13750.1 NbQ19g14190.1 NbQ19g14210.1 NbQ19g14920.1 NbQ19g18540.1 NbQ19g19870.1 NbQ19g21020.1 NbQ19g21220.1 NbQ19g22080.1 NbQ19g22800.1 NbQ19g24690.1 NbQ19g24730.1 rna19561
OG0000001 Capann_59V1aChr01g048170.1 NbL01g00020.1 NbL01g00940.1 NbL01g02330.1 NbL01g03550.1 NbL01g03650.1 NbL01g04410.1 NbL01g04920.1 NbL01g06850.1 NbL01g16120.1 NbL01g19150.1 NbL01g20140.1 NbL01g20930.1 NbL01g22230.1 NbL01g24190.1 NbL01g24280.1 NbL01g24300.1 NbL02g00570.1 NbL02g00900.1 NbL02g01270.1 NbL02g02110.1 NbL02g02210.1 NbL02g02470.1 NbL02g03180.1 NbL02g04740.1 NbL02g04750.1 NbL02g06120.1 NbL02g06860.1 NbL02g07280.1 NbL02g07680.1 NbL02g07740.1 NbL02g09780.1 NbL02g11320.1 NbL02g12670.1 NbL02g13080.1 NbL02g14050.1 NbL02g14190.1 NbL02g15010.1 NbL02g15890.1 NbL02g16190.1 NbL02g16730.1 NbL02g17070.1 NbL02g17360.1 NbL02g18820.1 NbL02g19340.1 NbL02g20100.1 NbL02g23950.1 NbL02g24800.1 NbL03g01610.1 NbL03g01680.1 NbL03g01890.1 NbL03g02230.1 NbL03g02600.1 NbL03g03410.1 NbL03g04990.1 NbL03g05400.1 NbL03g08030.1 NbL03g08250.1 NbL03g08690.1 NbL03g10230.1 NbL03g11060.1 NbL03g13030.1 NbL03g14960.1 NbL03g15110.1 NbL03g16690.1 NbL03g16900.1 NbL03g18260.1 NbL03g18950.1 NbL03g21180.1 NbL03g21210.1 NbL03g21530.1 NbL03g22960.1 NbL03g24430.1 NbL04g01140.1 NbL04g01490.1 NbL04g02030.1 NbL04g02560.1 NbL04g03700.1 NbL04g04160.1 NbL04g05240.1 NbL04g05420.1 NbL04g05850.1 NbL04g12420.1 NbL04g12640.1 NbL04g13650.1 NbL04g13780.1 NbL04g14310.1 NbL04g16260.1 NbL04g17750.1 NbL04g18380.1 NbL04g18870.1 NbL04g19030.1 NbL04g19630.1 NbL05g00320.1 NbL05g03060.1 NbL05g03300.1 NbL05g04060.1 NbL05g07620.1 NbL05g08630.1 NbL05g09580.1 NbL05g10060.1 NbL05g11400.1 NbL05g12280.1 NbL05g13170.1 NbL05g16020.1 NbL05g17530.1 NbL05g17730.1 NbL05g18340.1 NbL05g18590.1 NbL05g18600.1 NbL05g19640.1 NbL05g20640.1 NbL05g21000.1 NbL05g21640.1 NbL05g22610.1 NbL06g00640.1 NbL06g00660.1 NbL06g02210.1 NbL06g03150.1 NbL06g03680.1 NbL06g04910.1 NbL06g07950.1 NbL06g09970.1 NbL06g11480.1 NbL06g12220.1 NbL06g12400.1 NbL06g12460.1 NbL06g12850.1 NbL06g13120.1 NbL06g14450.1 NbL06g14780.1 NbL06g16990.1 NbL06g17200.1 NbL06g17760.1 NbL06g20380.1 NbL07g02100.1 NbL07g02540.1 NbL07g02970.1 NbL07g03110.1 NbL07g04840.1 NbL07g05350.1 NbL07g06580.1 NbL07g07530.1 NbL07g08450.1 NbL07g09380.1 NbL07g09870.1 NbL07g10730.1 NbL07g10850.1 NbL07g11080.1 NbL07g12450.1 NbL07g12710.1 NbL07g13110.1 NbL07g13920.1 NbL07g14240.1 NbL07g15520.1 NbL07g16220.1 NbL07g17480.1 NbL08g01820.1 NbL08g02750.1 NbL08g02930.1 NbL08g03510.1 NbL08g03620.1 NbL08g03850.1 NbL08g03970.1 NbL08g04040.1 NbL08g06150.1 NbL08g06410.1 NbL08g06680.1 NbL08g06730.1 NbL08g07620.1 NbL08g08450.1 NbL08g08640.1 NbL08g09910.1 NbL08g10160.1 NbL08g11760.1 NbL08g12570.1 NbL08g13630.1 NbL08g13890.1 NbL08g15050.1 NbL08g15340.1 NbL08g18010.1 NbL08g18420.1 NbL08g19080.1 NbL08g19190.1 NbL09g00900.1 NbL09g02160.1 NbL09g02330.1 NbL09g02470.1 NbL09g04150.1 NbL09g05210.1 NbL09g07010.1 NbL09g09070.1 NbL09g10290.1 NbL09g10500.1 NbL09g11220.1 NbL09g13490.1 NbL09g15290.1 NbL09g15830.1 NbL09g17240.1 NbL09g19250.1 NbL09g19460.1 NbL09g20190.1 NbL09g21040.1 NbL09g21520.1 NbL09g23460.1 NbL10g00080.1 NbL10g03710.1 NbL10g04330.1 NbL10g04560.1 NbL10g05200.1 NbL10g06320.1 NbL10g07510.1 NbL10g07960.1 NbL10g08670.1 NbL10g08970.1 NbL10g11120.1 NbL10g11340.1 NbL10g11820.1 NbL10g13720.1 NbL10g14560.1 NbL10g14770.1 NbL10g16430.1 NbL10g18140.1 NbL10g18380.1 NbL10g19280.1 NbL10g19690.1 NbL10g21210.1 NbL10g22680.1 NbL10g23160.1 NbL10g23560.1 NbL10g24210.1 NbL11g00680.1 NbL11g00970.1 NbL11g01230.1 NbL11g01270.1 NbL11g01520.1 NbL11g01530.1 NbL11g02920.1 NbL11g03540.1 NbL11g03990.1 NbL11g05630.1 NbL11g08950.1 NbL11g08980.1 NbL11g09510.1 NbL11g10840.1 NbL11g11030.1 NbL11g11230.1 NbL11g12430.1 NbL11g13300.1 NbL11g15430.1 NbL11g16390.1 NbL11g16410.1 NbL11g17320.1 NbL11g18090.1 NbL11g21310.1 NbL11g21470.1 NbL11g21780.1 NbL11g21820.1 NbL11g22270.1 NbL11g22310.1 NbL11g23180.1 NbL11g24100.1 NbL12g00130.1 NbL12g01810.1 NbL12g02230.1 NbL12g02720.1 NbL12g02760.1 NbL12g04120.1 NbL12g04550.1 NbL12g06630.1 NbL12g07830.1 NbL12g09170.1 NbL12g10580.1 NbL12g12090.1 NbL12g12490.1 NbL12g12630.1 NbL12g12800.1 NbL12g13320.1 NbL12g13460.1 NbL12g14430.1 NbL12g14970.1 NbL12g15490.1 NbL12g17460.1 NbL12g18190.1 NbL12g18590.1 NbL12g19900.1 NbL12g20690.1 NbL12g22040.1 NbL12g22560.1 NbL13g00350.1 NbL13g01440.1 NbL13g02400.1 NbL13g03210.1 NbL13g03360.1 NbL13g04070.1 NbL13g05250.1 NbL13g08460.1 NbL13g09010.1 NbL13g09140.1 NbL13g10290.1 NbL13g11570.1 NbL13g13370.1 NbL13g14910.1 NbL13g18680.1 NbL13g19510.1 NbL13g23520.1 NbL13g24010.1 NbL13g24190.1 NbL13g24460.1 NbL13g26310.1 NbL13g26640.1 NbL13g26860.1 NbL13g27260.1 NbL13g27960.1 NbL14g02460.1 NbL14g02750.1 NbL14g08750.1 NbL14g08910.1 NbL14g09120.1 NbL14g09540.1 NbL14g09920.1 NbL14g11070.1 NbL14g11150.1 NbL14g12570.1 NbL14g14530.1 NbL14g14860.1 NbL14g15240.1 NbL14g15460.1 NbL14g16620.1 NbL14g16910.1 NbL14g17940.1 NbL14g21150.1 NbL14g21750.1 NbL14g21910.1 NbL15g00790.1 NbL15g01170.1 NbL15g02310.1 NbL15g04220.1 NbL15g05970.1 NbL15g06340.1 NbL15g06440.1 NbL15g07020.1 NbL15g07370.1 NbL15g07470.1 NbL15g09010.1 NbL15g13210.1 NbL15g14550.1 NbL15g14600.1 NbL15g17290.1 NbL15g18170.1 NbL15g19710.1 NbL15g21840.1 NbL15g21930.1 NbL15g23410.1 NbL15g23420.1 NbL15g23430.1 NbL15g25130.1 NbL15g25200.1 NbL16g01760.1 NbL16g02140.1 NbL16g04460.1 NbL16g05010.1 NbL16g05020.1 NbL16g06780.1 NbL16g07540.1 NbL16g07980.1 NbL16g09760.1 NbL16g10610.1 NbL16g12320.1 NbL16g13510.1 NbL16g14420.1 NbL16g15690.1 NbL16g17420.1 NbL16g17790.1 NbL16g17880.1 NbL16g18730.1 NbL16g18940.1 NbL16g19440.1 NbL16g20980.1 NbL16g23180.1 NbL16g23610.1 NbL16g23660.1 NbL16g23910.1 NbL16g24550.1 NbL16g24640.1 NbL16g25300.1 NbL16g25630.1 NbL16g26710.1 NbL17g01590.1 NbL17g02070.1 NbL17g02120.1 NbL17g02920.1 NbL17g03040.1 NbL17g03540.1 NbL17g03700.1 NbL17g03800.1 NbL17g05400.1 NbL17g07510.1 NbL17g08450.1 NbL17g08930.1 NbL17g10090.1 NbL17g14370.1 NbL17g14600.1 NbL17g15390.1 NbL17g15900.1 NbL17g16000.1 NbL17g16910.1 NbL17g17480.1 NbL17g18240.1 NbL17g20020.1 NbL17g20830.1 NbL17g21220.1 NbL17g21690.1 NbL17g25960.1 NbL18g00030.1 NbL18g00150.1 NbL18g00310.1 NbL18g00670.1 NbL18g00700.1 NbL18g01630.1 NbL18g02650.1 NbL18g04460.1 NbL18g05210.1 NbL18g05690.1 NbL18g07270.1 NbL18g07440.1 NbL18g07500.1 NbL18g09090.1 NbL18g09810.1 NbL18g09880.1 NbL18g10500.1 NbL18g10990.1 NbL18g12070.1 NbL18g13060.1 NbL18g17480.1 NbL19g00230.1 NbL19g04470.1 NbL19g04700.1 NbL19g07770.1 NbL19g07870.1 NbL19g09260.1 NbL19g09870.1 NbL19g13480.1 NbL19g14360.1 NbL19g14400.1 NbL19g14720.1 NbL19g17700.1 NbL19g18860.1 NbL19g22750.1 NbQ13g00370.1 NbQ14g10310.1 NbQ17g06050.1
This script appears to have some problems with the file:
import pandas as pd
orthofinder_output = "../OrthoFinder-res/Results_Jan23/Orthogroups/Orthogroups-fixed.txt"
orthogroups = pd.read_csv(orthofinder_output, sep=' ', header=None)
# Extract gene family information
expansions = {}
contractions = {}
for i, row in orthogroups.iterrows():
if len(row) > 2:
expansions[row[0]] = row[2:]
else:
contractions[row[0]] = row[1]
# Print expansions and contractions
print("Expansions:", expansions)
print("Contractions:", contractions)
I got the following error:
geneFamilyExpansionsContractions.py:20: DtypeWarning: Columns (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600,601,602,603,604,605,606,607,608,609,610,611,612,613,614,615,616,617,618,619,620,621,622,623,624,625,626,627,628,629,630,631,632,633,634,635,636,637,638,639,640,641,642,643,644,645,646,647,648,649,650,651,652,653,654,655,656,657,658,659,660,661,662,663,664,665,666,667,668,669,670,671,672,673,674,675,676,677,678,679,680,681,682,683,684,685,686,687,688,689,690,691,692,693,694,695,696,697,698,699,700,701,702,703,704,705,706,707,708,709,710,711,712,713,714,715,716,717,718,719,720,721,722,723,724,725,726,727,728,729,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,746,747,748,749,750,751,752,753,754,755,756,757,758,759,760,761,762,763,764,765,766,767,768,769,770,771,772,773,774,775,776,777,778,779,780,781,782,783,784,785,786,787,788,789,790,791,792,793,794,795,796,797,798,799,800,801,802,803,804,805,806,807,808,809,810,811,812,813,814,815,816,817,818,819,820,821,822,823,824,825,826,827,828,829,830,831,832,833,834,835,836,837,838,839,840,841,842,843,844,845,846,847,848,849,850,851,852,853,854,855,856,857,858,859,860,861,862,863,864,865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880,881,882,883,884,885,886,887,888,889,890,891,892,893,894,895,896,897,898,899,900,901,902,903,904,905,906,907,908,909,910,911,912,913,914,915,916,917,918,919,920,921,922,923,924,925,926,927,928,929,930,931,932,933,934,935,936,937,938,939,940,941,942,943,944,945,946,947,948,949,950,951,952,953,954,955,956,957,958,959,960,961,962,963,964,965,966,967,968) have mixed types. Specify dtype option on import or set low_memory=False.
orthogroups = pd.read_csv(orthofinder_output, sep=' ', header=None)
...
Expansions:
...
Name: 63241, Length: 967, dtype: object, 'OG0063242': 2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
...
964 NaN
965 NaN
966 NaN
967 NaN
968 NaN
Name: 63242, Length: 967, dtype: object}
Contractions: {}
How can it be fixed?

Related

python not recognizing pandas_ta module

import requests
import pandas as pd
import pandas_ta as ta
def stochFourMonitor():
k_period = 14
d_period = 3
data = get_data('BTC-PERP',14400,1642935495,1643165895)
print(data)
data = data['result']
df = pd.DataFrame(data)
df['trailingHigh'] = df['high'].rolling(k_period).max()
df['trailingLow'] = df['low'].rolling(k_period).min()
df['%K'] = (df['close'] - df['trailingLow']) * 100 / (df['trailingHigh'] - df['trailingLow'])
df['%D'] = df['%K'].rolling(d_period).mean()
df.index.name = 'test'
df.set_index(pd.DatetimeIndex(df["startTime"]), inplace=True)
print(df)
df.drop(columns=['startTime'])
print(df)
df.ta.stoch(high='High', low='Low',close= 'Close', k=14, d=3, append=True)
#t = ta.stoch(close='close',high='high', low='low', k=14, d=3, append=True)
#df.ta.stoch(close='close',high='high', low='low', k=14, d=3, append=True)
def get_data(marketName,resolution,start_time,end_time):
data = requests.get('https://ftx.com/api/markets/' + marketName + '/candles?resolution=' + str(resolution) + '&start_time=' + str(start_time) + '&end_time=' + str(end_time)).json()
return data
I keep receiving the error 'NoneType' object has no attribute 'name'. See below for full exception. It seems like the code is not recognizing the pandas_ta module but I don't understand why. Any help troubleshooting would be much appreciated.
Exception has occurred: AttributeError (note: full exception trace is shown but execution is paused at: )
'NoneType' object has no attribute 'name'
File "C:\Users\Jason\Documents\TradingCode\FTX Websocket\testing21.py", line 21, in stochFourMonitor
df.ta.stoch(high='High', low='Low',close= 'Close', k=14, d=3, append=True)
File "C:\Users\Jason\Documents\TradingCode\FTX Websocket\testing21.py", line 31, in (Current frame)
stochFourMonitor()
You have to few values in your dataframe. You need at least 17 values (k=14, d=3)
>>> pd.Timestamp(1642935495, unit='s')
Timestamp('2022-01-23 10:58:15')
>>> pd.Timestamp(1643165895, unit='s')
Timestamp('2022-01-26 02:58:15')
>>> pd.DataFrame(get_data('BTC-PERP',14400,1642935495,1643165895)['result'])
0 2022-01-23T12:00:00+00:00 1.642939e+12 35690.0 36082.0 35000.0 35306.0 6.315513e+08
1 2022-01-23T16:00:00+00:00 1.642954e+12 35306.0 35460.0 34601.0 34785.0 7.246238e+08
2 2022-01-23T20:00:00+00:00 1.642968e+12 34785.0 36551.0 34712.0 36271.0 9.663773e+08
3 2022-01-24T00:00:00+00:00 1.642982e+12 36271.0 36283.0 35148.0 35351.0 6.007333e+08
4 2022-01-24T04:00:00+00:00 1.642997e+12 35351.0 35511.0 34821.0 34896.0 5.554126e+08
5 2022-01-24T08:00:00+00:00 1.643011e+12 34895.0 35610.0 33033.0 33709.0 1.676436e+09
6 2022-01-24T12:00:00+00:00 1.643026e+12 33709.0 34399.0 32837.0 34260.0 2.021096e+09
7 2022-01-24T16:00:00+00:00 1.643040e+12 34261.0 36493.0 33800.0 36101.0 1.989552e+09
8 2022-01-24T20:00:00+00:00 1.643054e+12 36101.0 37596.0 35990.0 36673.0 1.202684e+09
9 2022-01-25T00:00:00+00:00 1.643069e+12 36673.0 36702.0 35974.0 36431.0 4.538093e+08
10 2022-01-25T04:00:00+00:00 1.643083e+12 36431.0 36500.0 35719.0 36067.0 3.514587e+08
11 2022-01-25T08:00:00+00:00 1.643098e+12 36067.0 36824.0 36030.0 36431.0 5.830712e+08
12 2022-01-25T12:00:00+00:00 1.643112e+12 36431.0 37200.0 35997.0 36568.0 9.992247e+08
13 2022-01-25T16:00:00+00:00 1.643126e+12 36568.0 37600.0 36532.0 37079.0 8.225219e+08
14 2022-01-25T20:00:00+00:00 1.643141e+12 37077.0 37140.0 36437.0 36980.0 7.892745e+08
15 2022-01-26T00:00:00+00:00 1.643155e+12 36980.0 37242.0 36567.0 37238.0 3.226400e+08
>>> pd.DataFrame(get_data('BTC-PERP',14400,1642935495,1643165895)['result'])
...
AttributeError: 'NoneType' object has no attribute 'name'
Now change 1642935495 ('2022-01-23 10:58:15') by 1642845495 ('2022-01-22 10:58:15':
>>> pd.DataFrame(get_data('BTC-PERP',14400,1642845495,1643165895)['result']).ta.stoch()
STOCHk_14_3_3 STOCHd_14_3_3
13 NaN NaN
14 NaN NaN
15 80.824814 NaN
16 74.665546 NaN
17 72.970512 76.153624
18 73.930097 73.855385
19 80.993469 75.964693
20 84.814444 79.912670
21 89.775352 85.194422

Save geocoding results from address to longitude and latitude to original dataframe in Python

Given a small dataset df as follows:
id name address
0 1 ABC tower 北京市朝阳区
1 2 AC park 北京市海淀区
2 3 ZR hospital 上海市黄浦区
3 4 Fengtai library NaN
4 5 Square Point 上海市虹口区
I would like to obtain longitude and latidude for address column and append them to orginal dataframe. Please note there are NaNs in address column.
The code below gives me a table with addresses, longitude and latitude, but it ignores the NaN address rows, also the code should be improved:
import pandas as pd
import requests
import json
df = df[df['address'].notna()]
res = []
for addre in df['address']:
url = "http://restapi.amap.com/v3/geocode/geo?key=f057101329c0200f170be166d9b023a1&address=" + addre
dat = {
'count': "1",
}
r = requests.post(url, data = json.dumps(dat))
s = r.json()
infos = s['geocodes']
for j in range(0, 10000):
# print(j)
try:
more_infos = infos[j]
# print(more_infos)
except:
continue
try:
data = more_infos['location']
# print(data)
except:
continue
try:
lon_lat = data.split(',')
lon = float(lon_lat[0])
lat = float(lon_lat[1])
except:
continue
res.append([addre, lon, lat])
result = pd.DataFrame(res)
result.columns = ['address', 'longitude', 'latitude']
print(result)
result.to_excel('result.xlsx', index = False)
Out:
address longitude latitude
0 北京市朝阳区 116.601144 39.948574
1 北京市海淀区 116.329519 39.972134
2 上海市黄浦区 121.469240 31.229860
3 上海市虹口区 121.505133 31.264600
But how could I get the final result as follows? Thanks for your kind help at advance.
id name address longitude latitude
0 1 ABC tower 北京市朝阳区 116.601144 39.948574
1 2 AC park 北京市海淀区 116.329519 39.972134
2 3 ZR hospital 上海市黄浦区 121.469240 31.229860
3 4 Fengtai library NaN NaN NaN
4 5 Square Point 上海市虹口区 121.505133 31.264600
use pd.merge, as result is the longitude & latitude dataframe.
dfn = pd.merge(df, result, on='address', how='left')
or
for _, row in df.iterrows():
_id = row['id']
name = row['name']
addre = row['address']
if pd.isna(row['address']):
res.append([_id, name, addre, None, None])
continue
###### same code ######
url = '...'
# ...
###### same code ######
res.append([_id, name, addre, lon, lat])
result = pd.DataFrame(res)
result.columns = ['id', 'name', 'address', 'longitude', 'latitude']
print(result)
result.to_excel('result.xlsx', index = False)

How to reshape data in Python?

I have a data set as given below-
Timestamp = 22-05-2019 08:40 :Light = 64.00 :Temp_Soil = 20.5625 :Temp_Air = 23.1875 :Soil_Moisture_1 = 756 :Soil_Moisture_2 = 780 :Soil_Moisture_3 = 1002
Timestamp = 22-05-2019 08:42 :Light = 64.00 :Temp_Soil = 20.5625 :Temp_Air = 23.125 :Soil_Moisture_1 = 755 :Soil_Moisture_2 = 782 :Soil_Moisture_3 = 1002
And I want to Reshape(rearrange) the dataset to orient header columns like [Timestamp, Light, Temp_Soil, Temp_Air, Soil_Moisture_1, Soil_Moisture_2, Soil_Moisture_3] and their values as the row entry in Python.
One of possible solutions:
Instead of a "true" input file, I used a string:
inp="""Timestamp = 22-05-2019 08:40 :Light = 64.00 :TempSoil = 20.5625 :TempAir = 23.1875 :SoilMoist1 = 756 :SoilMoist2 = 780 :SoilMoist3 = 1002
Timestamp = 22-05-2019 08:42 :Light = 64.00 :TempSoil = 20.5625 :TempAir = 23.125 :SoilMoist1 = 755 :SoilMoist2 = 782 :SoilMoist3 = 1002"""
buf = pd.compat.StringIO(inp)
To avoid "folding" of output lines, I shortened field names.
Then let's create the result DataFrame and a list of "rows" to append to it.
For now - both of them are empty.
df = pd.DataFrame(columns=['Timestamp', 'Light', 'TempSoil', 'TempAir',
'SoilMoist1', 'SoilMoist2', 'SoilMoist3'])
src = []
Below is a loop processing input rows:
while True:
line = buf.readline()
if not(line): # EOF
break
lst = re.split(r' :', line.rstrip()) # Field list
if len(lst) < 2: # Skip empty source lines
continue
dct = {} # Source "row" (dictionary)
for elem in lst: # Process fields
k, v = re.split(r' = ', elem)
dct[k] = v # Add field : value to "row"
src.append(dct)
And the last step is to append rows from src to df :
df = df.append(src, ignore_index =True, sort=False)
When you print(df), for my test data, you will get:
Timestamp Light TempSoil TempAir SoilMoist1 SoilMoist2 SoilMoist3
0 22-05-2019 08:40 64.00 20.5625 23.1875 756 780 1002
1 22-05-2019 08:42 64.00 20.5625 23.125 755 782 1002
For now all columns are of string type, so you can change the required
columns to either float or int:
df.Light = pd.to_numeric(df.Light)
df.TempSoil = pd.to_numeric(df.TempSoil)
df.TempAir = pd.to_numeric(df.TempAir)
df.SoilMoist1 = pd.to_numeric(df.SoilMoist1)
df.SoilMoist2 = pd.to_numeric(df.SoilMoist2)
df.SoilMoist3 = pd.to_numeric(df.SoilMoist3)
Note that to_numeric() function is clever enough to recognize the possible
type to convert to, so first 3 columns changed their type to float64
and the next 3 to int64.
You can check it executing df.info().
One more possible conversion is to change Timestamp column
to DateTime type:
df.Timestamp = pd.to_datetime(df.Timestamp)

Handling exceptions with df.apply

I am using the tld python library to grab the first level domain from the proxy request logs using a apply function.
When I run into a strange request that tld doesnt know how to handle like 'http:1 CON' or 'http:/login.cgi%00' I run into an error message like this:
TldBadUrl: Is not a valid URL http:1 con!
TldBadUrlTraceback (most recent call last)
in engine
----> 1 new_fld_column = request_2['request'].apply(get_fld)
/usr/local/lib/python2.7/site-packages/pandas/core/series.pyc in apply(self, func, convert_dtype, args, **kwds)
2353 else:
2354 values = self.asobject
-> 2355 mapped = lib.map_infer(values, f, convert=convert_dtype)
2356
2357 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer (pandas/_libs/lib.c:66440)()
/home/cdsw/.local/lib/python2.7/site-packages/tld/utils.pyc in get_fld(url,
fail_silently, fix_protocol, search_public, search_private, **kwargs)
385 fix_protocol=fix_protocol,
386 search_public=search_public,
--> 387 search_private=search_private
388 )
389
/home/cdsw/.local/lib/python2.7/site-packages/tld/utils.pyc in process_url(url, fail_silently, fix_protocol, search_public, search_private)
289 return None, None, parsed_url
290 else:
--> 291 raise TldBadUrl(url=url)
292
293 domain_parts = domain_name.split('.')
In the mean time I have been weeding these out by using many lines like following code but there are hundreds or thousands of them in this dataset:
request_2 = request_1[request_1['request'] != 'http:1 CON']
request_2 = request_1[request_1['request'] != 'http:/login.cgi%00']
Dataframe:
request
request_url count
0 https://login.microsoftonline.com 24521
1 https://dt.adsafeprotected.com 11521
2 https://googleads.g.doubleclick.net 6252
3 https://fls-na.amazon.com 65225
4 https://v10.vortex-win.data.microsoft.com 7852222
5 https://ib.adnxs.com 12
The code:
from tld import get_tld
from tld import get_fld
from impala.dbapi import connect
from impala.util import as_pandas
import pandas as pd
import numpy as np
request = pd.read_csv('Proxy/Proxy_Analytics/Request_Grouped_By_Request_Count_12032018.csv')
#Remove rows where there were null values in the request column
request = request[pd.notnull(request['request'])]
#Reset index
request.reset_index(drop=True)
#Find the urls that contain IP addresses and exclude them from the new dataframe
request_1 = request[~request.request.str.findall(r'[0-9]+(?:\.[0-9]+){3}').astype(bool)]
#Reset index
request_1 = request_1.reset_index(drop=True)
#Appply the get_fld lib on the request column
new_fld_column = request_2['request'].apply(get_fld)
Is there anyway to keep this error from firing and instead add those that would error to a separate dataframe?
If you can wrap your function around a try-except clause, you can determine what rows error out by querying those rows with NaN:
import tld
from tld import get_fld
def try_get_fld(x):
try:
return get_fld(x)
except tld.exceptions.TldBadUrl:
return np.nan
print(df)
request_url count
0 https://login.microsoftonline.com 24521
1 https://dt.adsafeprotected.com 11521
2 https://googleads.g.doubleclick.net 6252
3 https://fls-na.amazon.com 65225
4 https://v10.vortex-win.data.microsoft.com 7852222
5 https://ib.adnxs.com 12
6 http:1 CON 10
7 http:/login.cgi%00 200
df['flds'] = df['request_url'].apply(try_get_fld)
print(df['flds'])
0 microsoftonline.com
1 adsafeprotected.com
2 doubleclick.net
3 amazon.com
4 microsoft.com
5 adnxs.com
6 NaN
7 NaN
Name: flds, dtype: object
faulty_url_df = df[df['flds'].isna()]
print(faulty_url_df)
request_url count flds
6 http:1 CON 10 NaN
7 http:/login.cgi%00 200 NaN

Searching ID from a text file

I am trying to search for a student ID from a text file and display the line if an ID is found.
Here is the code:
sid = input ('\nPlease enter the student ID you want to search: ' )
found = False
for line in student_file:
line = line.rstrip()
if sid == line[0]:
found = True
print (line)
print('\n')
if found == False:
print ("No student record under this ID.")
The text file contains the student ID, name and marks of different subjects
1235 abc 0.0 0.0 0.0 0.0 0.0
1111 def 19.0 20.0 30.0 20.3 12.3
1 ghi 100.0 100.0 100.0 100.0 100.0
5 jkl 100.0 100.0 100.0 100.0 100.0
Here if
input sid = 1 then it shows the details of the students with IDs 1235,1111,1
input is 1235, then it is displaying "no student record under this ID"
input is 5, then it shows the student details for ID=5
All I am trying to do is display the Student record for matched Id. I don't know where am going wrong.
Instead of using line[0] which is the first character you need to check the first word of line. This is because sid can be multiple characters.
You can do this by splitting the string at the first space and then selecting the first segment using [0];
if (line.split(" ")[0] == sid):
Optionally, you could do;
if (sid in line.split(" ")):

Categories

Resources