standup.lexicon
Class Lexeme

java.lang.Object
  extended by standup.lexicon.StructElement
      extended by standup.lexicon.Keyword
          extended by standup.lexicon.Lexeme
All Implemented Interfaces:
Serializable, Comparable, SQLSelectElement, Unifiable, XMLizable

public class Lexeme
extends Keyword
implements Comparable, Serializable

A Lexeme is a specific sense, or meaning of a word.

Instances of this class are obtained by calling Dictionary.getLexeme(String), which accesses a serialized hashtable, and thus does not require access to the SQL lexical database.

Lexemes are essentially read-only objects, they cannot be altered.

Author:
Ruli Manurung
See Also:
Serialized Form

Field Summary
private  int ambiguityCount
           
private  Concept cachedConcept
           
private  WordForm cachedHeadWordForm
           
private  List<String> cachedImageFiles
           
private  WordForm cachedModifierWordForm
           
private  SymbolType[] cachedSymbolSet
           
private  WordForm cachedWordForm
           
private  String conceptID
           
private  float fScore
           
private  String headWordFormID
           
private  String id
           
private  boolean isNounCompound
           
private  String modifierWordFormID
           
private  POS pos
           
private  String[] realConceptCodes
           
private  int semcorFreq
           
private static long serialVersionUID
           
private  String wordFormID
           
 
Constructor Summary
Lexeme(String id, int ambiguityCount, String wordFormID, String conceptID, int semcorFreq, float fScore, String[] realConceptCodes, POS pos, boolean isNounCompound, String headWordFormID, String modifierWordFormID)
          Constructor that provides all necessary details.
 
Method Summary
 int compareTo(Object arg0)
          This implementation of compareTo is consistent with equals(Object).
 boolean equals(Object obj)
           
 int getAmbiguityCount()
          Returns the number of Lexemes that share this Lexeme's orthography (count is inclusive of this Lexeme).
 Concept getConcept()
          Returns the Concept associated with this Lexeme.
 float getFamiliarityScore()
          Returns the F-score, or familiarity score, of this Lexeme.
 int getFrequency()
          Returns the Semcor frequency for this Lexeme.
 WordForm getHead()
          Returns the head WordForm of this Lexeme, if it is a compound noun, or null otherwise.
 String getID()
          Returns the unique ID of this Lexeme.
 List<String> getImageFile(SymbolType[] symbolSet)
           
 WordForm getModifier()
          Returns the modifier WordForm of this Lexeme, if it is a compound noun, or null otherwise.
 POS getPartOfSpeech()
          Returns the part of speech of this Lexeme.
 WordSequence getSpelling()
          Returns the spelling of this Lexeme.
 String getSQLSelectString()
          Returns a String that encodes how this Lexeme would be used in an SQL query, i.e.
 WordForm getWordForm()
          Returns the WordForm associated with this Lexeme.
 int hashCode()
           
 boolean isCompoundNoun()
          Returns true if this Lexeme is a compound noun, or false otherwise.
static Lexeme readXML(Element e)
          Returns an instance of a Lexeme (as returned by Dictionary.getLexeme(String)) whose ID is contained within the given XML Element.
 String shortString()
          A short string representation of this Lexeme, with the following form: ID(ortho,pos).
 String toString()
           
 String verboseString()
          A verbose string representation of this Lexeme, with the following form: ID(ortho,pos):gloss.
 void writeXML(Writer out, String indent)
          Implementation of XMLizable.writeXML(Writer, String).
 
Methods inherited from class standup.lexicon.Keyword
createKeyword, duplicate, readXMLList
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

serialVersionUID

private static final long serialVersionUID
See Also:
Constant Field Values

id

private final String id

ambiguityCount

private final int ambiguityCount

wordFormID

private final String wordFormID

conceptID

private final String conceptID

semcorFreq

private final int semcorFreq

fScore

private final float fScore

realConceptCodes

private final String[] realConceptCodes

pos

private final POS pos

isNounCompound

private final boolean isNounCompound

headWordFormID

private final String headWordFormID

modifierWordFormID

private final String modifierWordFormID

cachedWordForm

private transient WordForm cachedWordForm

cachedConcept

private transient Concept cachedConcept

cachedHeadWordForm

private transient WordForm cachedHeadWordForm

cachedModifierWordForm

private transient WordForm cachedModifierWordForm

cachedImageFiles

private transient List<String> cachedImageFiles

cachedSymbolSet

private transient SymbolType[] cachedSymbolSet
Constructor Detail

Lexeme

public Lexeme(String id,
              int ambiguityCount,
              String wordFormID,
              String conceptID,
              int semcorFreq,
              float fScore,
              String[] realConceptCodes,
              POS pos,
              boolean isNounCompound,
              String headWordFormID,
              String modifierWordFormID)
Constructor that provides all necessary details. This should only ever be called by ProtoLexeme.buildSerializedCache(String) when building the serialized hashtable of Lexemes to be used by Dictionary.

Parameters:
id - unique ID -- see STANDUP lexical database documentation for details
ambiguityCount - the number of Lexemes that have the same WordForm as this one
wordFormID - the ID of this Lexeme's WordForm
conceptID - the ID of this Lexeme's Concept
semcorFreq - Semcor frequency
fScore - F-score
realConceptCodes - an array of 'real' conceptcodes, taken from Widgit wordlists
pos - this Lexeme's part-of-speech
isNounCompound - whether or not this Lexeme is a compound noun
headWordFormID - the ID of this Lexeme's head WordForm, if it is a compound, null otherwise
modifierWordFormID - the ID of this Lexeme's modifier WordForm, if it is a compound, null otherwise
Method Detail

readXML

public static Lexeme readXML(Element e)
Returns an instance of a Lexeme (as returned by Dictionary.getLexeme(String)) whose ID is contained within the given XML Element.

Parameters:
e -
Returns:

getID

public String getID()
Returns the unique ID of this Lexeme.

Returns:
the unique ID of this Lexeme.

getAmbiguityCount

public int getAmbiguityCount()
Returns the number of Lexemes that share this Lexeme's orthography (count is inclusive of this Lexeme).

Returns:
the number of Lexemes that share this Lexeme's orthography (count is inclusive of this Lexeme).

getWordForm

public WordForm getWordForm()
Returns the WordForm associated with this Lexeme.

Returns:
the WordForm associated with this Lexeme.

getConcept

public Concept getConcept()
Returns the Concept associated with this Lexeme.

Returns:
the Concept associated with this Lexeme.

getFamiliarityScore

public float getFamiliarityScore()
Returns the F-score, or familiarity score, of this Lexeme.

Returns:

getFrequency

public int getFrequency()
Returns the Semcor frequency for this Lexeme.

Returns:

getHead

public WordForm getHead()
Returns the head WordForm of this Lexeme, if it is a compound noun, or null otherwise.

Returns:

getImageFile

public List<String> getImageFile(SymbolType[] symbolSet)
                          throws SymbolException
Throws:
SymbolException

getModifier

public WordForm getModifier()
Returns the modifier WordForm of this Lexeme, if it is a compound noun, or null otherwise.

Returns:

getPartOfSpeech

public POS getPartOfSpeech()
Returns the part of speech of this Lexeme.

Returns:

getSpelling

public WordSequence getSpelling()
Returns the spelling of this Lexeme.

Specified by:
getSpelling in class StructElement
Returns:

getSQLSelectString

public String getSQLSelectString()
Returns a String that encodes how this Lexeme would be used in an SQL query, i.e. its ID enclosed within single quotes, e.g. "'lx123456'".

Specified by:
getSQLSelectString in interface SQLSelectElement
Returns:

isCompoundNoun

public boolean isCompoundNoun()
Returns true if this Lexeme is a compound noun, or false otherwise.

Returns:

toString

public String toString()
Overrides:
toString in class Object

shortString

public String shortString()
A short string representation of this Lexeme, with the following form: ID(ortho,pos). For example, lx081537(bank,n). This is also the String returned by toString().

Returns:

verboseString

public String verboseString()
A verbose string representation of this Lexeme, with the following form: ID(ortho,pos):gloss. For example, lx081537(bank,n):a long ridge or pile; "a huge bank of earth".

Returns:

writeXML

public void writeXML(Writer out,
                     String indent)
              throws IOException
Implementation of XMLizable.writeXML(Writer, String). Writes out this Lexeme's id in an lx tag, e.g. <lx>lx123456</lx>.

Specified by:
writeXML in interface XMLizable
Parameters:
out - The output stream for the XML file, which is assumed to be already opened and writable.
indent - A string to be prepended before every line written by this method. If passed appropriate white space, e.g. XMLUtils.xmlIndent, it can be used to control indentation.
Throws:
IOException

hashCode

public int hashCode()
Overrides:
hashCode in class Object

equals

public boolean equals(Object obj)
Overrides:
equals in class Object

compareTo

public int compareTo(Object arg0)
This implementation of compareTo is consistent with equals(Object). It checks the Lexeme's getSpelling(), and only if it is equal uses getID() a 'tiebreaker'.

Specified by:
compareTo in interface Comparable
Parameters:
arg0 - the Lexeme to be compared to this object
Returns: