Quiz

Motivation

All the news messages in the 20 newsgroups test corpus were created during 1993 and were openly accessible in Usenet a part of the internet. The 20 newsgroups corpus itsself has been publicly available as an downloadable archive since at least 1997 (from Tom Mitchel's site accompaning his well-known book "Machine Learning").

Nevertheless being confused about the subleties of national and global data privacy protection rules I decided to protect the message text body by an Quiz. Through this I want to make sure, that the data is used exclusively for educational purposes.

Questions

Question 1

Father of IR


Question 2

20 newsgroups archive file What is highest ID (file name) in the 20 newsgroups archive?




Question 3

Document vector model of IR Consider following toy document collection consisting only of two documents: Using tf-idf variant "bnn.bnn" (SMART notation) to calculate the similarity between doc1 and doc2 using the inner product.

What is the correct value?