Machine learning for language toolkit

Tutorial slides / video
Quick Start
Importing Data
Data Transformations
Sequence Tagging
Topic Modeling

View the Project on GitHub mimno/Mallet

Once you have downloaded and installed MALLET, the easiest way to get started is through the mallet script. If you installed MALLET in the directory ~/Applications/mallet, this script will be in ~/Applications/mallet/bin. The following instructions assume that your current working directory is the MALLET directory. To use the script, specify a command and then some number of options, in this pattern:

bin/mallet [command] --option value --option value ...

Type bin/mallet to get a list of commands, and use the option --help with any command to get a description of valid options.

Data Import: To load all documents in specified directories into a MALLET data file, with class labels specified by the directory, use the command

bin/mallet import-dir --input [dir1] [dir2] [...] --output data.mallet

For more information, and many more options, see the data import quick start guide.

Classification: To evaluate MaxEnt and Naïve Bayes classifiers trained on this data using 10-fold cross validation, use the command

bin/mallet train-classifier --input data.mallet \
  --trainer MaxEnt --trainer NaiveBayes \
  --training-portion 0.9 --num-trials 10

This command will run 10 trials, in which the input data is randomly split into 90% training instances and 10% testing instances. For each trial, MALLET trains a MaxEnt classifier and a Naïve Bayes classifier on the training instances, then prints accuracy results and a matrix of correct and predicted labels for each classifier. For more information about training and evaluating classifiers, see the classification quick start guide.