match - using a lookup table in R with varying counts of data -
match - using a lookup table in R with varying counts of data -
hiiii, i've been working on issue weekend. i'm trying simple lookup, lookup table has different counts of info per lookup key.
let's have 2 tables: table1: (there columns of data, irrelevant problem) table1: (sample of 3 rows)
genename col1 col2 hggr .554444 brac4 .333222 fam34 .111222
my lookup table table of gene groups followed respective genes. lookup table can varying amount of columns depending on how many genes in group... little example, table has 20-30 genes per group... table2: (example of 2 rows)
genegroupname col1 col2 col3 chr1_45000_46000 hggr brac4 chr1_67000_70000 fam34
what want column in table1 shows corresponding gene group!
finalresulttable col1 col2 col3 chr1_45000_46000 hggr .554444 chr1_45000_46000 brac4 .333222 chr1_67000_70000 fam34 .111222
the code have far is:
finalresult<-cbind( gene_group[match(table1[,1], gene_group[,2]),1], table1)
but of course of study works genes found in 2nd column of gene grouping table! need search thru whole table , homecoming row number....
any help? in advance
david
one way convert table 2 long format, column genegroupname
, single column fellow member genes, , then utilize match
.
(table1 <- data.frame(genename=sample(letters[1:12]), col2=runif(12))) # genename col2 # 1 f 0.6116285 # 2 l 0.5752088 # 3 j 0.7499011 # 4 d 0.9405068 # 5 0.9360968 # 6 k 0.6549850 # 7 0.7070163 # 8 e 0.3521952 # 9 c 0.4234293 # 10 g 0.7750203 # 11 b 0.1418680 # 12 h 0.6632382 (table2 <- data.frame(genegroupname=1:4, g1=letters[1:4], g2=letters[5:8], g3=letters[9:12])) # genegroupname g1 g2 g3 # 1 1 e # 2 2 b f j # 3 3 c g k # 4 4 d h l (table2.long <- reshape(table2, direction='long', varying=list(-1), timevar='gene')) # genegroupname gene g1 id # 1.1 1 1 1 # 2.1 2 1 b 2 # 3.1 3 1 c 3 # 4.1 4 1 d 4 # 1.2 1 2 e 1 # 2.2 2 2 f 2 # 3.2 3 2 g 3 # 4.2 4 2 h 4 # 1.3 1 3 1 # 2.3 2 3 j 2 # 3.3 3 3 k 3 # 4.3 4 3 l 4 table1$grp <- table2.long$genegroupname[match(table1$genename, table2.long$g1)] table1 # genename col2 genegroupname # 1 f 0.6116285 2 # 2 l 0.5752088 4 # 3 j 0.7499011 2 # 4 d 0.9405068 4 # 5 0.9360968 1 # 6 k 0.6549850 3 # 7 0.7070163 1 # 8 e 0.3521952 1 # 9 c 0.4234293 3 # 10 g 0.7750203 3 # 11 b 0.1418680 2 # 12 h 0.6632382 4
r match lookup
Comments
Post a Comment