How to filter documents in a tm corpus in R based on metadata? -
i using r tm package , trying select documents index , metadata:
orbit_corpus<-corpus( tm_corpus, readercontrol = list(reader=myreader)) meta(my_corpus[[1]]) author : a8 origin : department heading : whib id : 1 year : 2013
i find documents within first hundred documents of corpus have been published in 2013. works identify whether metadata 'year' document 1 2013.
meta(my_corpus[[1]],"year") == 2013 [1] true
i need gives me option find among first 100 indexes, meet criterion. imagine similar (but not work , unfortunately not generate list of documents).
meta(orbit_corpus[[1:100]],"year") == 2013 error in x$content[[i]] : recursive indexing failed @ level 4
many help!
you use tm_filter
on first 100 documents of corpus (orbit_corpus[1:100]
)
tm_filter(orbit_corpus[1:100], fun = function(x) meta(x)[["year"]] == "2013")
from documentation
tm_filter
returns corpus containing documentsfun
matches
Comments
Post a Comment