How to filter documents in a tm corpus in R based on metadata? -
i using r tm package , trying select documents index , metadata:
orbit_corpus<-corpus( tm_corpus, readercontrol = list(reader=myreader)) meta(my_corpus[[1]]) author : a8 origin : department heading : whib id : 1 year : 2013 i find documents within first hundred documents of corpus have been published in 2013. works identify whether metadata 'year' document 1 2013.
meta(my_corpus[[1]],"year") == 2013 [1] true i need gives me option find among first 100 indexes, meet criterion. imagine similar (but not work , unfortunately not generate list of documents).
meta(orbit_corpus[[1:100]],"year") == 2013 error in x$content[[i]] : recursive indexing failed @ level 4 many help!
you use tm_filter on first 100 documents of corpus (orbit_corpus[1:100])
tm_filter(orbit_corpus[1:100], fun = function(x) meta(x)[["year"]] == "2013") from documentation
tm_filterreturns corpus containing documentsfunmatches
Comments
Post a Comment