Package standup.lexicon

This package provides classes and interfaces that handle access and manipulation of the STANDUP lexicon.

See:
          Description

Class Summary
Concept A semantic concept which can be viewed as an abstract meaning of one or more words.
CustomLexicon A CustomLexicon is a LexemeSet that has a label and is XMLsaveable.
Dictionary A class that has various static methods for obtaining Lexemes, WordForms and Concepts.
FClass Abstractly, an FClass defines a subset of the STANDUP lexicon based on a threshold of familiarity scores.
Keyword A Keyword is an object appearing in a WordStruct that is not 'canned' or 'filler' text.
Lexeme A Lexeme is a specific sense, or meaning of a word.
LexemeSet A LexemeSet is, appropriately enough, a set of Lexemes.
LexicalComponents A LexicalComponents represents a collection of various CustomLexicons, Topics, and specially designated blacklist CustomLexicons, which logically define the 'working lexicon'.
OptionsGUILexicon A Java Swing-based GUI for editing an OptionsLexicon.
OptionsLexicon A subclass of Options that stores user-specific settings for a ProfileLexicon.
POS A POS object represents a part-of-speech of a Lexeme.
ProfileLexicon A Profile relating to lexical resources.
StructElement A StructElement is an element that can appear within a WordStruct.
Topic A Topic represents a collection of Lexemes that are related in some sense, e.g.
WordForm A WordForm is a word with a unique orthographic and phonetic spelling.
WordSequence Unsurprisingly, a sequence of words.
WordString A simple String representing the orthography of a word.
WordStruct A structure that has a label and a sequence of StructElements, which can be one of Lexeme, WordForm, WordString or WordStruct.
 

Exception Summary
LexiconException A very simple subclass of Exception that is specifically for Exceptions arising from within the standup.lexicon package.
 

Package standup.lexicon Description

This package provides classes and interfaces that handle access and manipulation of the STANDUP lexicon.

The STANDUP lexicon is abstractly a lexical resource that combines, among others, semantic, syntactic, orthographic, and phonetic information from various freely available resources. It is mainly implemented as a relational database using the PostgreSQL server software. This package provides access to the STANDUP lexicon, and adds extra functionality such as the ability to manage custom lexicons, organize lexemes into hierarchies of topics, etc.

Basic lexical classes

The basic lexical classes are shown in the following table. Click on the names to see detailed API documentation for each class.
NameDescription
WordStringA simple String representing the orthography of a word
WordSequenceA sequence of WordStrings representing the orthography of compound words
WordFormA word with a unique orthographic and phonetic spelling
ConceptA semantic concept -- taken directly from WordNet synsets
LexemeA specific sense, or meaning of a word

Obtaining serialized objects through Dictionary

Although the primary source of lexical information is the STANDUP PostgreSQL database, a large portion of the most commonly accessed information is duplicated within serialized files containing hashtables of Lexemes, WordForms, and Concepts. These files are found under /standup/resources/serialized within the main STANDUP .jar file, and are accessed through the static Dictionary class. These serialized objects increase the performance of the Java lexicon API as it reduces the overhead involved in accessing the SQL database.

The Dictionary is a class containing various static methods that can be used to obtain instances of these serialized lexical objects, e.g.:

Organizing lexicons

There are several classes that can be used to organize and arrange Lexemes, i.e.:
NameDescription
LexemeSetA set of Lexemes
CustomLexiconA LexemeSet that has an identifier label and can be saved to/loaded from a file.
TopicA CustomLexicon that may have subtopics, thus forming a hierarchical organization of Lexemes..
All the above classes support methods for adding, removing, obtaining, and testing membership of Lexemes.

LexicalComponents and the working lexicon

The LexicalComponents class provides functionality for organising abstract, composite, working lexicons for individual users. Users may register CustomLexicons, Topics and designated blacklist CustomLexicons to a LexicalComponents and then obtain the working lexicon by calling LexicalComponents.getWorkingLexicon(), which returns the union of all component CustomLexicons and Topics subtracted by the union of all blacklist CustomLexicons.

The LexicalComponents implements the Profileable interface, so it actually stores user-specific details in ProfileLexicon and OptionsLexicon classes -- see the standup.profiling documentation for more details, in particular, ProfileManager to see how to create and access user profiles.

Putting words together

The standup.lexicon package contains several classes and interfaces that support combining words together to form structures such as sentences. Strictly speaking, this is probably beyond the scope of the lexicon package, and in practice is wholly intended to support the joke generation functionality found within the standup.joke package, but it is included in the standup.lexicon package as its definition is very closely coupled to that of the basic lexical classes.

The following classes and interfaces are also conceivably useful outside the context of joke generation.

Quick usage guide

To start working with the lexicon API, the easiest way is to obtain some instances through Dictionary. The following code obtains all Lexemes spelt "bank" and prints out all their different meanings.
import standup.lexicon.*

LexemeSet bankLexemes = Dictionary.getSpelledLexemes(new WordSequence("bank"));

for (Iterator<Lexeme> iter = bankLexemes.getLexemesIterator(); iter.hasNext();)
{
  Lexeme lexeme = iter.next();
  System.out.println("The meaning of "+lexeme+" is "+lexeme.getConcept().getBriefGloss());
}
Let us suppose we are working with a user who for some reason does not want to have any sense of the word "bank" in their lexicon. We can create a custom LexicalComponents for them as follows:
LexicalComponents lc = new LexicalComponents();
ProfileManager.initialize();
ProfileManager.createUser("username",lc);
ProfileManager.useProfile("username",lc);

CustomLexicon myBlacklist = new CustomLexicon("myblack",bankLexemes.getLexemes());
lc.getProfile().addBlacklist(myBlacklist);

LexemeSet usersLexicon = lc.getWorkingLexicon();

System.out.println("STANDUP dictionary has "+Dictionary.getAllLexemes().getLexemeCount()+" entries.");
System.out.println("User's working lexicon has "+usersLexicon.getLexemeCount()+" entries.");

ProfileManager.saveCurrentProfile(lc);
The above code creates a profile for user "username", and creates and adds to the user's profile a blacklist containing the "bank" lexemes. It then obtains the 'working lexicon', prints some statistics, and saves the profile to disk.