org.hd.d.pg2k.ai.scorer
Class ScorerPopulation

java.lang.Object
  extended by org.hd.d.pg2k.ai.scorer.ScorerPopulation

public final class ScorerPopulation
extends java.lang.Object

Thread-safe container for Scorer-and-parameter information to maintain an evolving population. This separates Scorers by basic (parameterless) name, on the grounds that they are not directly comparable.

This has limited capacity (so as to cap resource use), and discards the least good parameterised Scorers as needed to make way for new (better) instances. (These discarded less-good Scorers may have their results retained in a smalled secondary "negative" cache to speed up re-testing.)

This has the automatially persists/announces/shares new/good instances using the system variables events mechanism, local and global.

TODO: in case of accute memory shortage abandon least-good Scorers


Nested Class Summary
static interface ScorerPopulation.NewBestCallbackIF
          Interface of call-back hook to report a new "best" Scorer by name, eg to the system event record.
 
Field Summary
private  MemoryTools.SimpleLRUMap<java.lang.String,Tuple.Pair<java.lang.Long,ScoreAndConf>> _backstopLRU
          A back-stop LRU cache of Scorer goodness, so that in particular the results of poor Scorers need not be excessively recomputed; never null.
private  int _backstopLRUMaxSize
          Maximum size/capacity of _backstopLRU; strictly positive.
private  java.util.concurrent.ConcurrentMap<java.lang.String,java.util.List<java.lang.String>> _gBS_cache
          Cache of results for getBreadingSet(); never null.
private  java.util.Set<java.lang.String> _gCSWP_cache
          Cache of result from getCurrentScorersWithParameters(); may be null if nothing cached.
private  MemoryTools.SimpleLRUMap<java.lang.String,ScorerIF> _gSI_cache
          Small thread-safe LRU cache for getScorerInstance(); never null.
private  java.lang.Object _lock
          Non-user-visible lock to serialise activity on _scorerScores and _scorerByGoodness; never null.
private  java.util.Map<java.lang.String,java.util.NavigableSet<java.lang.String>> _scorerByGoodness
          Map from base Scorer name to a least-good-first NavigableSet of full nameAndParameters; never null.
private  java.util.concurrent.ConcurrentMap<java.lang.String,Tuple.Pair<java.lang.Long,ScoreAndConf>> _scorerScores
          Thread-safe cache of Scorer scores (and time when cached), indirectly size-limited; never null.
private  java.util.Map<java.lang.String,ScorerIF> baseScorers
          Immutable Map from base Scorer names (no parameters) to base/parameterless instances to monitor/maintain; non-null, non-empty.
private  ScorerPopulation.NewBestCallbackIF callback
          Callback to report new best-of-breed being inserted into cache; may be null.
private  java.util.concurrent.atomic.AtomicInteger countdownToNextScavenge
          Countdown to next scavenge: when zero then scavenge and add on current population size.
private static boolean FORCE_MAJOR_COLLECTION_AFTER_MINOR_FAILS
          If true then if a minor scavenge fails always force a major scavenge.
static java.lang.String GEN_PROPS_GLOBAL_POST_ENABLE_SUFFIX
          GenProps 'generic' parameter that must be set for us to post to the global event store.
private  int maxPopulationSize
          Maximum number of Scorer details to retain, possibly excluding the fixed/base Scorers, strictly positive.
private static boolean SCAVENGE_FOR_DIVERSITY
          If true then we may kick out apparent-near duplicates as the population reaches maximum.
 java.util.Comparator<java.lang.String> SCORER_RANK_COMPARATOR
          Comparator used to rank Scorers relative to one another by increasing goodness, never null.
private static int SIMILAR_SCORE_DELTA
          Minimum 'delta' in score for two Scorers to be considered potentially similar; non-negative.
private  java.util.Set<java.lang.Class<? extends ScorerIF>> supportedScorerClasses
          Set of class values for the supported Scorer types for fast validation; never null.
private static boolean USE_GLOBAL_EVENTS_ALSO
          If true, allow posting the global event for exchange of the best/new Scorers.
 
Constructor Summary
ScorerPopulation(java.util.Map<java.lang.String,ScorerIF> baseScorers, int maxPopulationSize, SimpleExhibitPipelineIF vars)
          Construct with a list of the base Scorers (names, no parameters) to keep track of.
ScorerPopulation(java.util.Map<java.lang.String,ScorerIF> baseScorers, ScorerPopulation.NewBestCallbackIF callback, int maxPopulationSize)
          Construct with a list of the base Scorers (names, no parameters) to keep track of.
 
Method Summary
private  void _clearCaches()
          Clear any caches which depend on the population state remaining unchanged.
private  Tuple.Pair<java.lang.Long,ScoreAndConf> _remove(java.lang.String nameAndParameters)
          Remove and return any extant value for this Scorer value, null if none previously.
private  void _scavenge(java.lang.String largestSubPopulationBasename)
          Attempt to strip out near-clones from the population to make space for more diversity.
 void clear()
          Reset/clear the population to its initial (empty) state.
 java.util.List<java.lang.String> getBreedingSet(java.lang.String baseName)
          Get immutable random-access list of Scorers to "breed" from, best first; never null, possibly empty.
 java.util.Set<java.lang.String> getCurrentScorersWithParameters(boolean allowStale)
          Current immutable set of available Scorer names and parameters (where applicable); never null but may be empty.
 ScorerIF getScorerInstance(java.lang.String nameAndParameters)
          Get Scorer instance given the Scorer{:value=name}* format; null if no such Scorer available.
 ScoreAndConf getScorerWeighting(ScorerIF scorer, boolean allowStale)
          ScoreAndConfidence for the given Scorer over all exhibit types; null if not known.
 ScoreAndConf getScorerWeighting(java.lang.String scorerNameAndParameters, boolean allowStale)
          ScoreAndConfidence for the given Scorer over all exhibit types; null if not known.
 int maxSize()
          Return the target maximum population size; non-negative.
 boolean putScorerWeighting(ScorerIF scorer, ScoreAndConf sac, SimpleExhibitPipelineIF dataSource, SimpleLoggerIF log)
          Save the ScoreAndConfidence for the given Scorer over all exhibit types.
 boolean putScorerWeighting(java.lang.String nameAndParameters, ScoreAndConf sac, SimpleExhibitPipelineIF dataSource, SimpleLoggerIF log)
          Save the ScoreAndConfidence for the given Scorer over all exhibit types.
 int size()
          Get current population size; non-negative.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SCORER_RANK_COMPARATOR

public final java.util.Comparator<java.lang.String> SCORER_RANK_COMPARATOR
Comparator used to rank Scorers relative to one another by increasing goodness, never null. Scorer values not currently represented in the population are treated as if goodness 0 (ie neutral).

If both goodnesses are available a secondary ranking is done by increasing score.

Complete ties are broken by Scorer name.


SCAVENGE_FOR_DIVERSITY

private static final boolean SCAVENGE_FOR_DIVERSITY
If true then we may kick out apparent-near duplicates as the population reaches maximum. The aim of doing this is to increase diversity of the 'gene pool' for a given capacity, by making space for significantly different genotypes and avoiding them being swamped or pushed out by many near-clones of the overal best Scorer in any given sub-population.

This scavenging may have a random element.

Scavenging may be quite slow.

See Also:
Constant Field Values

maxPopulationSize

private final int maxPopulationSize
Maximum number of Scorer details to retain, possibly excluding the fixed/base Scorers, strictly positive. Must in fact be large than the number of base Scorer types, since we will need at least once slot for each non-parameterised type.


baseScorers

private final java.util.Map<java.lang.String,ScorerIF> baseScorers
Immutable Map from base Scorer names (no parameters) to base/parameterless instances to monitor/maintain; non-null, non-empty.


supportedScorerClasses

private final java.util.Set<java.lang.Class<? extends ScorerIF>> supportedScorerClasses
Set of class values for the supported Scorer types for fast validation; never null.


callback

private final ScorerPopulation.NewBestCallbackIF callback
Callback to report new best-of-breed being inserted into cache; may be null.


_backstopLRU

private final MemoryTools.SimpleLRUMap<java.lang.String,Tuple.Pair<java.lang.Long,ScoreAndConf>> _backstopLRU
A back-stop LRU cache of Scorer goodness, so that in particular the results of poor Scorers need not be excessively recomputed; never null. Normally we only keep the ScoreAndConf of the best Scorers, but if we happen to be repeatedly asked for the results of poor Scorers and we don't hold those results because they are not good to be in the main population then we may simply waste disproportionate time recomputing them.

This aims to be a bonus back-stop cache of all results not in the main cache, and is not strictly part of the counted population because these are not part of the potential "breeding set": they are here only for "negative cacheing" of poor Scorer parameters sets.

We keep it much smaller than the main population so that it is not a significant resource drain.

We push Scorers into it when they are displaced from the main population.

Thread-safe.

It is permissible and viable to hold a lock on this object over multiple operations to make them into one compound atomic operation.


_backstopLRUMaxSize

private final int _backstopLRUMaxSize
Maximum size/capacity of _backstopLRU; strictly positive.


_lock

private final java.lang.Object _lock
Non-user-visible lock to serialise activity on _scorerScores and _scorerByGoodness; never null. No blocking/external calls are made in the scope of this lock so as to ensure that it is held only for very short intervals.


_scorerScores

private final java.util.concurrent.ConcurrentMap<java.lang.String,Tuple.Pair<java.lang.Long,ScoreAndConf>> _scorerScores
Thread-safe cache of Scorer scores (and time when cached), indirectly size-limited; never null. Map from scorer name-and-parameters to ScoreAndConfidence and the time computed/cached.

Single get() accesses to this can be made safely without further locking so as to maximise concurrency.

Note that since a put() operation does not lock out get() operations, any updates on the main _scorerScores map must be incrementally valid, ie incorrect data must never be inserted into the _scorerScores map.


_scorerByGoodness

private final java.util.Map<java.lang.String,java.util.NavigableSet<java.lang.String>> _scorerByGoodness
Map from base Scorer name to a least-good-first NavigableSet of full nameAndParameters; never null. We pre-populate this with empty maps under each base name that we handle (so it is never necessary to test for presence in, or ever alter, this outer map; hence it is immutable).

The top-level Comparator is selected to sort by goodness derived from the _scorerScores, so any extant entry in the nameAndParameters should be removed from the _scorerByGoodness map before removal from or replacement in the _scorerScores map, and re-inserted afterwards as necessary.

Since the different base Scorer names/types are held separately, this helps keep the probably-distinct populations from competing with one another, though the overall population size cap is still present.

Because of a potential lack of symmetry in scoring (ie reversing the prediction of a bad Scorer does not necessarily make it a good one) we are interested in preserving the highest-confidence, positive-correlation Scorers.

Not thread-safe; access only under _lock.


GEN_PROPS_GLOBAL_POST_ENABLE_SUFFIX

public static final java.lang.String GEN_PROPS_GLOBAL_POST_ENABLE_SUFFIX
GenProps 'generic' parameter that must be set for us to post to the global event store.

See Also:
Constant Field Values

countdownToNextScavenge

private final java.util.concurrent.atomic.AtomicInteger countdownToNextScavenge
Countdown to next scavenge: when zero then scavenge and add on current population size.


SIMILAR_SCORE_DELTA

private static final int SIMILAR_SCORE_DELTA
Minimum 'delta' in score for two Scorers to be considered potentially similar; non-negative. This is the minimum difference in the raw score, not weighted by confidence, for two Scorers on similar overall goodness to be considered 'similar', given that we will then further filter by genotype and/or phenotype, the assumption being that differences of this scale may just be (rounding and other) noise.

A good value for this is probably less than 1% of maximum, even though many Scorers will struggle to be much better than chance than this.

Note that we may apply a tighter test to values very close to zero.

A value of zero indicates that scores have to be identical to be considered similar by this measure.


FORCE_MAJOR_COLLECTION_AFTER_MINOR_FAILS

private static final boolean FORCE_MAJOR_COLLECTION_AFTER_MINOR_FAILS
If true then if a minor scavenge fails always force a major scavenge. This may be expensive but more-or-less guarantees that there will always be some space in the population for marginal but unusual Scorers to have a chance to be bred from.

See Also:
Constant Field Values

USE_GLOBAL_EVENTS_ALSO

private static final boolean USE_GLOBAL_EVENTS_ALSO
If true, allow posting the global event for exchange of the best/new Scorers. The mechanism must also be enabled in the GenProps for this to happen.

See Also:
Constant Field Values

_gCSWP_cache

private volatile java.util.Set<java.lang.String> _gCSWP_cache
Cache of result from getCurrentScorersWithParameters(); may be null if nothing cached. Initially null, and cleared on any update (put) to the population.

Any value cached here must be immutable for safety and sanity.

Marked volatile for thread-safe lock-free access.


_gBS_cache

private final java.util.concurrent.ConcurrentMap<java.lang.String,java.util.List<java.lang.String>> _gBS_cache
Cache of results for getBreadingSet(); never null. Cleared by any put/update operation on the population.

Thread-safe map from (valid) baseName to breeding set.


_gSI_cache

private final MemoryTools.SimpleLRUMap<java.lang.String,ScorerIF> _gSI_cache
Small thread-safe LRU cache for getScorerInstance(); never null. Intended to avoid repeated re-parsing of args and recreation of instances, etc, for very-commonly-used Scorers used to evaluate every exhibit. Usually we'd expect this to be a handful actually needed regularly, but while hunting for new solutions (or scavenging for space) we'll expect to churn this cache quite hard.

Constructor Detail

ScorerPopulation

public ScorerPopulation(java.util.Map<java.lang.String,ScorerIF> baseScorers,
                        ScorerPopulation.NewBestCallbackIF callback,
                        int maxPopulationSize)
Construct with a list of the base Scorers (names, no parameters) to keep track of. Note that we may slightly exceed the maximum specified population size for expediency, and we may dump some (less good) part of the population under memory stress.

Parameters:
baseScorers - non-null, non-empty, pure-base-name seed set
maxPopulationSize - maximum number of Scorer details to retain in memory, possibly excluding the fixed/base Scorers, strictly positive; this value is advisory and may be slightly exceeded for performance
callback - used to report new best-of-breed being inserted into cache; may be null

ScorerPopulation

public ScorerPopulation(java.util.Map<java.lang.String,ScorerIF> baseScorers,
                        int maxPopulationSize,
                        SimpleExhibitPipelineIF vars)
Construct with a list of the base Scorers (names, no parameters) to keep track of. Note that we may slightly exceed the maximum specified population size for expediency, and we may dump some (less good) part of the population under memory stress.

This automatically posts new best-of-breed Scorers to the system variables event store as they are cached.

Parameters:
baseScorers - non-null, non-empty, pure-base-name seed set
maxPopulationSize - maximum number of Scorer details to retain in memory, possibly excluding the fixed/base Scorers, strictly positive; this value is advisory and may be slightly exceeded for performance
vars - system variables pipeline to record new/best Scorer events to
Method Detail

clear

public void clear()
Reset/clear the population to its initial (empty) state. Does not shut out all other activity, so it is possible that new items may be inserted while this is running.


getScorerWeighting

public ScoreAndConf getScorerWeighting(ScorerIF scorer,
                                       boolean allowStale)
ScoreAndConfidence for the given Scorer over all exhibit types; null if not known. Never recomputes, always fetches from its cache if present, else returns null.

Older results (typically up to 1 day older) may be returned if allowStale is true; this may help the system overall ride out temporary I/O problems and disconnections.

These get operations are generally fast, though some cycle-stealing incremental housekeeping might be performed on this call.

Parameters:
scorer - the Scorer instance; never null
allowStale - if true then allow an older cache response (up to VLONG period older)
Returns:
the score represents the correlation with the underlying votes (and whatever the scoring is measured against) with MAX meaning perfect correlation, 0 meaning no correlation, and -MAX meaning perfectly wrong answers all the time, and the confidence 0 if we have no (or very few) data points yet and approaching MAX as we have a large (enough) number of data points
Throws:
java.lang.IllegalArgumentException - if the Scorer is null

getScorerWeighting

public ScoreAndConf getScorerWeighting(java.lang.String scorerNameAndParameters,
                                       boolean allowStale)
ScoreAndConfidence for the given Scorer over all exhibit types; null if not known. Never recomputes, always fetches from its cache if present, else returns null.

Older results (typically up to 1 day older) may be returned if allowStale is true; this may help the system overall ride out temporary I/O problems and disconnections.

These get operations are generally fast, though some cycle-stealing incremental housekeeping might be performed on this call.

Parameters:
scorerNameAndParameters - the name and parameters of the Scorer; never null
allowStale - if true then allow an older cache response (up to VLONG period older)
Returns:
the score represents the correlation with the underlying votes (and whatever the scoring is measured against) with MAX meaning perfect correlation, 0 meaning no correlation, and -MAX meaning perfectly wrong answers all the time, and the confidence 0 if we have no (or very few) data points yet and approaching MAX as we have a large (enough) number of data points

putScorerWeighting

public boolean putScorerWeighting(ScorerIF scorer,
                                  ScoreAndConf sac,
                                  SimpleExhibitPipelineIF dataSource,
                                  SimpleLoggerIF log)
Save the ScoreAndConfidence for the given Scorer over all exhibit types. Saves the given Scorer and "goodness", if necessary evicting a poorly-performing Scorer to make room ie to limit the population size.

If the "put" Scorer is very good compared to the existing population members then it may get persisted/shared via the event mechanism.

This routine will only consider storing entries for Scorers whose base name is on the list we were constructed with.

Note that a put() operation does not lock out get() operations, so any updates on the main _scorerScores map must be incrementally valid, ie incorrect data must never be inserted into the _scorerScores map.

This will attempt to cache the actual Scorer instance provided which may save time reconstructing it later, and memory by better sharing. This instance may be recovered with getScorerInstance() later, though may have to be recreated (ie returning the same instance is never guaranteed).

Parameters:
scorer - the Scorer instance; never null
sac - rating of this Scorer across all exhibits; never null
dataSource - to record/persist/share good Scorers through; null if no Scorer persistence is required
log - logger; never null
Returns:
true if the Scorer result is new for its sub-population and best in it and has a positive overall "goodness" (ie has some positive predictive ability)
Throws:
java.lang.IllegalArgumentException - if the Scorer is null or of an unsupported type

putScorerWeighting

public boolean putScorerWeighting(java.lang.String nameAndParameters,
                                  ScoreAndConf sac,
                                  SimpleExhibitPipelineIF dataSource,
                                  SimpleLoggerIF log)
Save the ScoreAndConfidence for the given Scorer over all exhibit types. Saves the given Scorer and "goodness", if necessary evicting a poorly-performing Scorer to make room ie to limit the population size.

If the "put" Scorer is very good compared to the existing population members then it may get persisted/shared via the event mechanism.

This routine will only consider storing entries for Scorers whose base name is on the list we were constructed with.

Note that a put() operation does not lock out get() operations, so any updates on the main _scorerScores map must be incrementally valid, ie incorrect data must never be inserted into the _scorerScores map.

Parameters:
nameAndParameters - the name and parameters of the scorer; never null
sac - rating of this Scorer across all exhibits; never null
dataSource - to record/persist/share good Scorers through; null if no Scorer persistence is required
log - logger; never null
Returns:
true if the Scorer result is new for its sub-population and best in it and has a positive overall "goodness" (ie has some positive predictive ability)

_remove

private Tuple.Pair<java.lang.Long,ScoreAndConf> _remove(java.lang.String nameAndParameters)
Remove and return any extant value for this Scorer value, null if none previously. Must only be called under the data lock to be atomic wrt the data stores.

Parameters:
nameAndParameters - non-null valid Scorer value incluing optional parameters
Returns:
previous timestamp/value for this Scorer, else null if nne

_scavenge

private void _scavenge(java.lang.String largestSubPopulationBasename)
Attempt to strip out near-clones from the population to make space for more diversity. This works on the assumption that there may be many valuable Scorer values that while not optimal across all exhibits or at this very moment, may be valuable on sub-sets of the exhibits now or in the near future. This routine attempts to prevent limited population space (gene-pool capacity) from being overrun with very slight variations on (effective clones of) a monoculture.

Currently this is done by elimination of genotypic near-clones by eliminating those Scorers that have very similar Scores and no sufficiently-distinct genes.

TODO: In future this may be done by elimination of phenotypic near clones by eliminating those Scorers that are have the same predction performance over a given calbration set.

This may not trim out anything from the specified sub-population.

This will never remove the 'best' item nor the base/parameterless Scorer.

Values removed by scavenging may be pushed into the back-stop LRU cache in case they are used again soon by the outside world; thus we may limit the amount of any scavengine we do in one go so as to avoid overflowing this back-stop cache.

This tries a series of more drastic pruning measures until it zaps some/enough Scorers, with the inital pruning passes being very quick and gentle.

This does not guarantee to remove any Scorers at all.

This must be called under the data lock.

Run time is expected to average about O(1) (peak O(n)) for a size-n sub-population.

Parameters:
largestSubPopulationBasename -

_clearCaches

private void _clearCaches()
Clear any caches which depend on the population state remaining unchanged. This clearing is done within the lock that protects the internal data structures (even though the caches can be safely accessed outside the lock) to minimise unwanted races (and eliminate them for lock-protected cache accesses).


size

public int size()
Get current population size; non-negative. This value may slightly exceed the requested maximum.

This does not include the count of any entries in the back-stop LRU (negative) cache.

Returns:
count of unique Scorers that might be selected as part of a "breeding set"

maxSize

public int maxSize()
Return the target maximum population size; non-negative. The actual population size may exceed this somewhat; this is a target.


getCurrentScorersWithParameters

public java.util.Set<java.lang.String> getCurrentScorersWithParameters(boolean allowStale)
Current immutable set of available Scorer names and parameters (where applicable); never null but may be empty. The values returned are of the form ScorerName{:name=value}*.

The Scorers returned by this will generally be the best available, suitable for making a real prediction of and exhibit score, and will usually be the best one or handful per base-Scorer type, allowing that different Scorer types may have different "skill" domains, and excluding any with non-positive goodness.

This routine may be expensive, though will try to be efficient.


getBreedingSet

public java.util.List<java.lang.String> getBreedingSet(java.lang.String baseName)
Get immutable random-access list of Scorers to "breed" from, best first; never null, possibly empty. This returns all of (or some of the best of) the Scorers with the given base name.

This generally trims away all Scorers with non-positive values, but will always try to return the few best items regardless, ie the result is not completely empty unless the specified sub-population is.

The result will generally be large if the population is.


getScorerInstance

public ScorerIF getScorerInstance(java.lang.String nameAndParameters)
Get Scorer instance given the Scorer{:value=name}* format; null if no such Scorer available. Any instance returned may be a shared/cached instance rather than a new instance.

This does not attempt to canonicalise the names it is given, so may create and cache duplicates for example.

This can be used to create any valid Scorer instance, not just entries in the given population.


DHD Multimedia Gallery V1.50.55

Copyright (c) 1996-2008, Damon Hart-Davis. All rights reserved.