Machine learning for language toolkit
Once you have downloaded and installed MALLET, the easiest way to get started is through the mallet script. If you installed MALLET in the directory ~/Applications/mallet
, this script will be in ~/Applications/mallet/bin
. The following instructions assume that your current working directory is the MALLET directory. To use the script, specify a command and then some number of options, in this pattern:
bin/mallet [command] --option value --option value ...
Type bin/mallet
to get a list of commands, and use the option --help
with any command to get a description of valid options.
Data Import: To load all documents in specified directories into a MALLET data file, with class labels specified by the directory, use the command
bin/mallet import-dir --input [dir1] [dir2] [...] --output data.mallet
For more information, and many more options, see the data import quick start guide.
Classification: To evaluate MaxEnt and Naïve Bayes classifiers trained on this data using 10-fold cross validation, use the command
bin/mallet train-classifier --input data.mallet \
--trainer MaxEnt --trainer NaiveBayes \
--training-portion 0.9 --num-trials 10
This command will run 10 trials, in which the input data is randomly split into 90% training instances and 10% testing instances. For each trial, MALLET trains a MaxEnt classifier and a Naïve Bayes classifier on the training instances, then prints accuracy results and a matrix of correct and predicted labels for each classifier. For more information about training and evaluating classifiers, see the classification quick start guide.