Categorizing Documents in Ruby
Paul Dix • New York, NY • Talk

Date: April 21, 2007
Published: not published
Announced: unknown

Text classification is the task of selecting a class or category for a document or block of text. The canonical example of this is the use of the Naive Bayes classifier for identifying spam vs. non-spam email. Classifiers can also be used for language identification, categorizing news articles or blog posts, detecting trackback spam, comment spam, wiki spam, and more. In my talk I will cover the basics of document classification while focusing on the various tools available in Ruby for each aspect of classification.

Paul Dix is a computer science student at Columbia University in New York City. Before going back to school in 2005, Paul worked at McAfee as a developer. He has been attending the nyc.rb meetings since October of 2005. Text classification is a subset of Paul’s interests in natural language processing, machine learning, and information retrieval. Last summer he worked as a consultant with EastMedia developing web applications in Ruby on Rails. Paul also attended RailsConf last June and codes in Ruby every chance he gets.

GORUCO 2007

Explore all talks recorded at GORUCO 2007
+1