|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Class Summary | |
---|---|
Concept | A semantic concept which can be viewed as an abstract meaning of one or more words. |
CustomLexicon | A CustomLexicon is a LexemeSet that has a label and is
XMLsaveable . |
Dictionary | A class that has various static methods for obtaining Lexeme s,
WordForm s and Concept s. |
FClass | Abstractly, an FClass defines a subset of the STANDUP lexicon based
on a threshold of familiarity scores. |
Keyword | A Keyword is an object appearing in a WordStruct that is not
'canned' or 'filler' text. |
Lexeme | A Lexeme is a specific sense, or meaning of a word. |
LexemeSet | A LexemeSet is, appropriately enough, a set of Lexeme s. |
LexicalComponents | A LexicalComponents represents a collection of various
CustomLexicon s, Topic s, and specially designated
blacklist CustomLexicon s, which logically define the 'working
lexicon'. |
OptionsGUILexicon | A Java Swing-based GUI for editing an OptionsLexicon . |
OptionsLexicon | A subclass of Options that stores user-specific settings for a
ProfileLexicon . |
POS | A POS object represents a part-of-speech of a Lexeme . |
ProfileLexicon | A Profile relating to lexical resources. |
StructElement | A StructElement is an element that can appear within a
WordStruct . |
Topic | A Topic represents a collection of Lexeme s that are related
in some sense, e.g. |
WordForm | A WordForm is a word with a unique orthographic and phonetic
spelling. |
WordSequence | Unsurprisingly, a sequence of words. |
WordString | A simple String representing the orthography of a word. |
WordStruct | A structure that has a label and a sequence of StructElement s, which
can be one of Lexeme , WordForm , WordString or
WordStruct . |
Exception Summary | |
---|---|
LexiconException | A very simple subclass of Exception that is specifically for Exceptions arising from within the standup.lexicon package. |
This package provides classes and interfaces that handle access and manipulation of the STANDUP lexicon.
The STANDUP lexicon is abstractly a lexical resource that combines, among others, semantic, syntactic, orthographic, and phonetic information from various freely available resources. It is mainly implemented as a relational database using the PostgreSQL server software. This package provides access to the STANDUP lexicon, and adds extra functionality such as the ability to manage custom lexicons, organize lexemes into hierarchies of topics, etc.
Name | Description |
WordString | A simple String representing the orthography of a word |
WordSequence | A sequence of WordString s representing the orthography of compound words |
WordForm | A word with a unique orthographic and phonetic spelling |
Concept | A semantic concept -- taken directly from WordNet synsets |
Lexeme | A specific sense, or meaning of a word |
Dictionary
Although the primary source of lexical information is the STANDUP PostgreSQL database, a large portion of the most commonly accessed information is duplicated within serialized files containing hashtables of Lexeme
s, WordForm
s, and Concept
s. These files are found under /standup/resources/serialized
within the main STANDUP .jar file, and are accessed through the static Dictionary
class. These serialized objects increase the performance of the Java lexicon API as it reduces the overhead involved in accessing the SQL database.
Dictionary
is a class containing various static methods that can be used to obtain instances of these serialized lexical objects, e.g.:
Dictionary.getLexeme(String)
: get Lexeme
by ID.
Dictionary.getWordForm(String)
: get WordForm
by ID.
Dictionary.getConcept(String)
: get Concept
by ID.
Dictionary.getAllLexemes()
: get all known Lexeme
s.
Dictionary.getAllWordForms()
: get all known WordForm
s.
Dictionary.getAllConcepts()
: get all known Concept
s.
Dictionary.getSpelledWordForms(WordSequence)
: get WordForm
s by spelling.
Dictionary.getSpelledLexemes(WordSequence)
: get Lexeme
s by spelling.
Dictionary.getSpelledLexemes(WordForm)
: get Lexeme
s by WordForm
.
Lexeme
s, i.e.:
Name | Description |
LexemeSet | A set of Lexeme s |
CustomLexicon | A LexemeSet that has an identifier label and can be saved to/loaded from a file. |
Topic | A CustomLexicon that may have subtopics, thus forming a hierarchical organization of Lexeme s.. |
Lexeme
s.
LexicalComponents
and the working lexiconLexicalComponents
class provides functionality for organising abstract, composite, working lexicons for individual users. Users may register CustomLexicon
s, Topic
s and designated blacklist CustomLexicon
s to a LexicalComponents
and then obtain the working lexicon by calling LexicalComponents.getWorkingLexicon()
, which returns the union of all component CustomLexicon
s and Topic
s subtracted by the union of all blacklist CustomLexicon
s.
The LexicalComponents
implements the Profileable
interface, so it actually stores user-specific details in ProfileLexicon
and OptionsLexicon
classes -- see the standup.profiling
documentation for more details, in particular, ProfileManager
to see how to create and access user profiles.
standup.lexicon
package contains several classes and interfaces that support combining words together to form structures such as sentences. Strictly speaking, this is probably beyond the scope of the lexicon
package, and in practice is wholly intended to support the joke generation functionality found within the standup.joke
package, but it is included in the standup.lexicon
package as its definition is very closely coupled to that of the basic lexical classes.
The following classes and interfaces are also conceivably useful outside the context of joke generation.
WordStruct
: a WordStruct
is a structure that has a label and a sequence of StructElement
s, which can be one of Lexeme
, WordForm
, WordString
or WordStruct
. This enables the construction of hierarchical structures, for example:
WordStruct s = <s, [np, vp, '.']>
WordStruct np = <np, ['the',lx026692]>
WordStruct vp = <vp, ['is',v]>
WordStruct v = <v, [lx182781]>
Lexeme lx026692 = "bank"
Lexeme lx182781 = "closed"
StructElement
: as mentioned above, a StructElement
is an element that can appear within a WordStruct
, and can be one of Lexeme
, WordForm
, WordString
or WordStruct
.
Keyword
: a Keyword
is a subinterface of StructElement
that, roughly speaking, stands for anything in a sentence that is not 'canned' or 'filler' text. Its significance relates to the joke generation functionality in the standup.joke
package, where Keyword
s are obtained through Schema
or Clause
instantiation and thus must be handled separately from canned text.
Dictionary
. The following code obtains all Lexeme
s spelt "bank" and prints out all their different meanings.
import standup.lexicon.*
LexemeSet bankLexemes = Dictionary.getSpelledLexemes(new WordSequence("bank"));
for (Iterator<Lexeme>
iter = bankLexemes.getLexemesIterator(); iter.hasNext();)
{
Lexeme lexeme = iter.next();
System.out.println("The meaning of "+lexeme+" is "+lexeme.getConcept().getBriefGloss());
}
Let us suppose we are working with a user who for some reason does not want to have any sense of the word "bank" in their lexicon. We can create a custom LexicalComponents
for them as follows:
LexicalComponents lc = new LexicalComponents(); ProfileManager.initialize(); ProfileManager.createUser("username",lc); ProfileManager.useProfile("username",lc); CustomLexicon myBlacklist = new CustomLexicon("myblack",bankLexemes.getLexemes()); lc.getProfile().addBlacklist(myBlacklist); LexemeSet usersLexicon = lc.getWorkingLexicon(); System.out.println("STANDUP dictionary has "+Dictionary.getAllLexemes().getLexemeCount()+" entries."); System.out.println("User's working lexicon has "+usersLexicon.getLexemeCount()+" entries."); ProfileManager.saveCurrentProfile(lc);The above code creates a profile for user
"username"
, and creates and adds to the user's profile a blacklist containing the "bank" lexemes. It then obtains the 'working lexicon', prints some statistics, and saves the profile to disk.
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |