Tf-idf, or term frequency-inverse document frequency, is a statistic that indicates how important a word is to the entire document. This lesson will explain term frequency and inverse document frequency, and show how we can use tf-idf to identify the most relevant words in a body of text.
Find specific words tf-idf for given documents:
var natural = require('natural'); var TfIdf = natural.TfIdf; var tfidf = new TfIdf(); tfidf.addDocument('this document is about node.'); tfidf.addDocument('this document is about ruby.'); tfidf.addDocument('this document is about ruby and node.'); tfidf.tfidfs('node ruby', function(i, measure) { console.log('document #' + i + ' is ' + measure); }); /* document #0 is 1 document #1 is 1 document #2 is 2 */
List most important words:
tfidf.listTerms(0 /*document index*/).forEach(function(item) { console.log(item.term + ': ' + item.tfidf); });