python - Pandas read_excel mangles non-dates to dates -



python - Pandas read_excel mangles non-dates to dates -

from next zip file:

wget http://www.nature.com/nature/journal/v498/n7453/extref/nature12172-s1.zip unzip nature12172-s1.zip

reading supplementary_table2.xlsx, has weird gene names row ids "2010002n04rik", these parsed dates, not regular strings!!

expression = pd.read_excel('nature12172-s1/supplementary_table2.xlsx', "supptable2final.txt", # need specify index column both first , lastly columns, # because lastly column "gene category" index_col=[0, -1], parse_dates=false, infer_datetime_format=false) expression.index multiindex(levels=[[100043387, 2013-03-01 00:00:00, 2013-03-02 00:00:00, 2013-03-03 00:00:00, 2013-03-04 00:00:00, 2013-03-05 00:00:00, 2013-03-06 00:00:00, 2013-03-07 00:00:00, u'0610007l01rik', u'0610007p14rik', u'0610007p22rik', u'0610008f07rik', u'0610009b22rik', u'0610009d07rik', u'0610009o20rik', u'0610010b08rik', u'0610010f05rik', u'0610010k06rik', 2013-03-08 00:00:00, 2013-03-09 00:00:00, 2013-03-10 00:00:00, 2013-03-11 00:00:00, u'0610010k14rik', u'0610010o12rik', u'0610011f06rik', u'0610011l14rik', u'0610012g03rik', u'0610012h03rik', u'0610030e20rik', u'0610031j06rik', u'0610037l13rik', u'0610037p05rik', u'0610038b21rik', u'0610039k10rik', u'0610040b10rik', u'0610040j01rik', u'0910001l09rik', 2013-04-03 00:00:00, 2013-09-01 00:00:00, 2013-09-02 00:00:00, 2013-09-03 00:00:00, 2013-09-04 00:00:00, 2013-09-05 00:00:00, 2013-09-06 00:00:00, 2013-09-07 00:00:00, 2013-09-08 00:00:00, 2013-09-09 00:00:00, 2013-09-10 00:00:00, 2013-09-11 00:00:00, 2013-09-12 00:00:00, 2013-09-14 00:00:00, 2013-09-15 00:00:00, u'1100001g20rik', u'1110001a16rik', u'1110001j03rik', u'1110002b05rik', u'1110002l01rik', u'1110002n22rik', u'1110003e01rik', u'1110004e09rik', u'1110004f10rik', u'1110005a03rik', u'1110006o24rik', u'1110007c09rik', u'1110008f13rik', u'1110008j03rik', u'1110008l16rik', u'1110008p14rik', u'1110012d08rik', u'1110012j17rik', u'1110012l19rik', u'1110014n23rik', u'1110017f19rik', u'1110018g07rik', u'1110018h23rik', u'1110018j18rik', u'1110020a21rik', u'1110020g09rik', u'1110021j02rik', u'1110021l09rik', u'1110028c15rik', u'1110031i02rik', u'1110032a03rik', u'1110032a04rik', u'1110032f04rik', u'1110034a24rik', u'1110034b05rik', u'1110034g24rik', u'1110037f02rik', u'1110038b12rik', u'1110038d17rik', u'1110038f14rik', u'1110049f12rik', u'1110051m20rik', u'1110054o05rik', u'1110057k04rik', u'1110058l19rik', u'1110059e24rik', u'1110059g10rik', u'1110059m19rik', ...], [u'housekeeping', u'lps response']], labels=[[8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 0, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, ...], [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, ...]], names=[u'gene', u'gene category'])

i've tried every combination of parse_dates=false , infer_datetime_format=false, documentation claims default, , yet still these ids parsed strings, not dates?

edit: do not same thing when read text file has same data:

wget ftp://ftp.ncbi.nlm.nih.gov/geo/series/gse41nnn/gse41265/suppl/gse41265_allgenestpm.txt.gz look = pd.read_table("gse41265_allgenestpm.txt.gz", compression="gzip", index_col=0) expression.index index([u'xkr4', u'ab338584', u'b3gat2', u'npl', u't2', u't', u'pde10a', u'1700010i14rik', u'6530411m01rik', u'pabpc6', u'ak019626', u'ak020722', u'qk', u'b930003m22rik', u'rgs8', u'pacrg', u'ak038428', u'ak163153', u'park2', u'ak080902', u'agpat4', u'map3k4', u'ak029100', u'plg', u'slc22a3', u'rgs16', u'ak021075', u'slc22a2', u'slc22a1', u'igf2r', u'airn', u'mas1', u'mrgprh', u'pnldc1', u'mrpl18', u'tcp1', u'rnasel', u'snora20', u'acat3', u'acat2', u'wtap', u'sod2', u'gpr31c', u'tcp10c', u'ttll2', u'unc93a', u'gm10512', u'rgsl1', u'smok2a', u'smok2b', u'ak036897', u'bc068229', u'smok(tcr)', u'ak143195', u'ak008572', u'tcte2', u'mllt4', u'5830403l16rik', u'gm7168', u'dact2', u'smoc2', u'4930474m22rik', u'thbs2', u'wdr27', u'ak004434', u'1600012h06rik', u'phf10', u'loc106740', u'gm5531', u'tcte3', u'9030025p20rik', u'ak050117', u'gm3435', u'gm10510', u'dll1', u'teddm1', u'fam120b', u'psmb1', u'tbp', u'pdcd2', u'prdm9', u'chd1', u'rgmb', u'zfp960', u'ak138383', u'zfp97', u'glul', u'ak164331', u'riok2', u'lix1', u'ak164875', u'lnpep', u'vmn2r90', u'mir99b', u'mirlet7e', u'mir125a', u'4930546h06rik', u'ak043564', u'has1', u'fpr1', ...], dtype='object')

python pandas

Comments

Popular posts from this blog

formatting - SAS SQL Datepart function returning odd values -

c++ - Apple Mach-O Linker Error(Duplicate Symbols For Architecture armv7) -

php - Yii 2: Unable to find a class into the extension 'yii2-admin' -