|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.hd.d.pg2k.ai.scorer.ScorerPopulation
public final class ScorerPopulation
Thread-safe container for Scorer-and-parameter information to maintain an evolving population. This separates Scorers by basic (parameterless) name, on the grounds that they are not directly comparable.
This has limited capacity (so as to cap resource use), and discards the least good parameterised Scorers as needed to make way for new (better) instances. (These discarded less-good Scorers may have their results retained in a smalled secondary "negative" cache to speed up re-testing.)
This has the automatially persists/announces/shares new/good instances using the system variables events mechanism, local and global.
TODO: in case of accute memory shortage abandon least-good Scorers
| Nested Class Summary | |
|---|---|
static interface |
ScorerPopulation.NewBestCallbackIF
Interface of call-back hook to report a new "best" Scorer by name, eg to the system event record. |
| Field Summary | |
|---|---|
private MemoryTools.SimpleLRUMap<java.lang.String,Tuple.Pair<java.lang.Long,ScoreAndConf>> |
_backstopLRU
A back-stop LRU cache of Scorer goodness, so that in particular the results of poor Scorers need not be excessively recomputed; never null. |
private int |
_backstopLRUMaxSize
Maximum size/capacity of _backstopLRU; strictly positive. |
private java.util.concurrent.ConcurrentMap<java.lang.String,java.util.List<java.lang.String>> |
_gBS_cache
Cache of results for getBreadingSet(); never null. |
private java.util.Set<java.lang.String> |
_gCSWP_cache
Cache of result from getCurrentScorersWithParameters(); may be null if nothing cached. |
private MemoryTools.SimpleLRUMap<java.lang.String,ScorerIF> |
_gSI_cache
Small thread-safe LRU cache for getScorerInstance(); never null. |
private java.lang.Object |
_lock
Non-user-visible lock to serialise activity on _scorerScores and _scorerByGoodness; never null. |
private java.util.Map<java.lang.String,java.util.NavigableSet<java.lang.String>> |
_scorerByGoodness
Map from base Scorer name to a least-good-first NavigableSet of full nameAndParameters; never null. |
private java.util.concurrent.ConcurrentMap<java.lang.String,Tuple.Pair<java.lang.Long,ScoreAndConf>> |
_scorerScores
Thread-safe cache of Scorer scores (and time when cached), indirectly size-limited; never null. |
private java.util.Map<java.lang.String,ScorerIF> |
baseScorers
Immutable Map from base Scorer names (no parameters) to base/parameterless instances to monitor/maintain; non-null, non-empty. |
private ScorerPopulation.NewBestCallbackIF |
callback
Callback to report new best-of-breed being inserted into cache; may be null. |
private java.util.concurrent.atomic.AtomicInteger |
countdownToNextScavenge
Countdown to next scavenge: when zero then scavenge and add on current population size. |
private static boolean |
FORCE_MAJOR_COLLECTION_AFTER_MINOR_FAILS
If true then if a minor scavenge fails always force a major scavenge. |
static java.lang.String |
GEN_PROPS_GLOBAL_POST_ENABLE_SUFFIX
GenProps 'generic' parameter that must be set for us to post to the global event store. |
private int |
maxPopulationSize
Maximum number of Scorer details to retain, possibly excluding the fixed/base Scorers, strictly positive. |
private static boolean |
SCAVENGE_FOR_DIVERSITY
If true then we may kick out apparent-near duplicates as the population reaches maximum. |
java.util.Comparator<java.lang.String> |
SCORER_RANK_COMPARATOR
Comparator used to rank Scorers relative to one another by increasing goodness, never null. |
private static int |
SIMILAR_SCORE_DELTA
Minimum 'delta' in score for two Scorers to be considered potentially similar; non-negative. |
private java.util.Set<java.lang.Class<? extends ScorerIF>> |
supportedScorerClasses
Set of class values for the supported Scorer types for fast validation; never null. |
private static boolean |
USE_GLOBAL_EVENTS_ALSO
If true, allow posting the global event for exchange of the best/new Scorers. |
| Constructor Summary | |
|---|---|
ScorerPopulation(java.util.Map<java.lang.String,ScorerIF> baseScorers,
int maxPopulationSize,
SimpleExhibitPipelineIF vars)
Construct with a list of the base Scorers (names, no parameters) to keep track of. |
|
ScorerPopulation(java.util.Map<java.lang.String,ScorerIF> baseScorers,
ScorerPopulation.NewBestCallbackIF callback,
int maxPopulationSize)
Construct with a list of the base Scorers (names, no parameters) to keep track of. |
|
| Method Summary | |
|---|---|
private void |
_clearCaches()
Clear any caches which depend on the population state remaining unchanged. |
private Tuple.Pair<java.lang.Long,ScoreAndConf> |
_remove(java.lang.String nameAndParameters)
Remove and return any extant value for this Scorer value, null if none previously. |
private void |
_scavenge(java.lang.String largestSubPopulationBasename)
Attempt to strip out near-clones from the population to make space for more diversity. |
void |
clear()
Reset/clear the population to its initial (empty) state. |
java.util.List<java.lang.String> |
getBreedingSet(java.lang.String baseName)
Get immutable random-access list of Scorers to "breed" from, best first; never null, possibly empty. |
java.util.Set<java.lang.String> |
getCurrentScorersWithParameters(boolean allowStale)
Current immutable set of available Scorer names and parameters (where applicable); never null but may be empty. |
ScorerIF |
getScorerInstance(java.lang.String nameAndParameters)
Get Scorer instance given the Scorer{:value=name}* format; null if no such Scorer available. |
ScoreAndConf |
getScorerWeighting(ScorerIF scorer,
boolean allowStale)
ScoreAndConfidence for the given Scorer over all exhibit types; null if not known. |
ScoreAndConf |
getScorerWeighting(java.lang.String scorerNameAndParameters,
boolean allowStale)
ScoreAndConfidence for the given Scorer over all exhibit types; null if not known. |
int |
maxSize()
Return the target maximum population size; non-negative. |
boolean |
putScorerWeighting(ScorerIF scorer,
ScoreAndConf sac,
SimpleExhibitPipelineIF dataSource,
SimpleLoggerIF log)
Save the ScoreAndConfidence for the given Scorer over all exhibit types. |
boolean |
putScorerWeighting(java.lang.String nameAndParameters,
ScoreAndConf sac,
SimpleExhibitPipelineIF dataSource,
SimpleLoggerIF log)
Save the ScoreAndConfidence for the given Scorer over all exhibit types. |
int |
size()
Get current population size; non-negative. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public final java.util.Comparator<java.lang.String> SCORER_RANK_COMPARATOR
If both goodnesses are available a secondary ranking is done by increasing score.
Complete ties are broken by Scorer name.
private static final boolean SCAVENGE_FOR_DIVERSITY
This scavenging may have a random element.
Scavenging may be quite slow.
private final int maxPopulationSize
private final java.util.Map<java.lang.String,ScorerIF> baseScorers
private final java.util.Set<java.lang.Class<? extends ScorerIF>> supportedScorerClasses
private final ScorerPopulation.NewBestCallbackIF callback
private final MemoryTools.SimpleLRUMap<java.lang.String,Tuple.Pair<java.lang.Long,ScoreAndConf>> _backstopLRU
This aims to be a bonus back-stop cache of all results not in the main cache, and is not strictly part of the counted population because these are not part of the potential "breeding set": they are here only for "negative cacheing" of poor Scorer parameters sets.
We keep it much smaller than the main population so that it is not a significant resource drain.
We push Scorers into it when they are displaced from the main population.
Thread-safe.
It is permissible and viable to hold a lock on this object over multiple operations to make them into one compound atomic operation.
private final int _backstopLRUMaxSize
private final java.lang.Object _lock
private final java.util.concurrent.ConcurrentMap<java.lang.String,Tuple.Pair<java.lang.Long,ScoreAndConf>> _scorerScores
Single get() accesses to this can be made safely without further locking so as to maximise concurrency.
Note that since a put() operation does not lock out get() operations, any updates on the main _scorerScores map must be incrementally valid, ie incorrect data must never be inserted into the _scorerScores map.
private final java.util.Map<java.lang.String,java.util.NavigableSet<java.lang.String>> _scorerByGoodness
The top-level Comparator is selected to sort by goodness derived from the _scorerScores, so any extant entry in the nameAndParameters should be removed from the _scorerByGoodness map before removal from or replacement in the _scorerScores map, and re-inserted afterwards as necessary.
Since the different base Scorer names/types are held separately, this helps keep the probably-distinct populations from competing with one another, though the overall population size cap is still present.
Because of a potential lack of symmetry in scoring (ie reversing the prediction of a bad Scorer does not necessarily make it a good one) we are interested in preserving the highest-confidence, positive-correlation Scorers.
Not thread-safe; access only under _lock.
public static final java.lang.String GEN_PROPS_GLOBAL_POST_ENABLE_SUFFIX
private final java.util.concurrent.atomic.AtomicInteger countdownToNextScavenge
private static final int SIMILAR_SCORE_DELTA
A good value for this is probably less than 1% of maximum, even though many Scorers will struggle to be much better than chance than this.
Note that we may apply a tighter test to values very close to zero.
A value of zero indicates that scores have to be identical to be considered similar by this measure.
private static final boolean FORCE_MAJOR_COLLECTION_AFTER_MINOR_FAILS
private static final boolean USE_GLOBAL_EVENTS_ALSO
private volatile java.util.Set<java.lang.String> _gCSWP_cache
Any value cached here must be immutable for safety and sanity.
Marked volatile for thread-safe lock-free access.
private final java.util.concurrent.ConcurrentMap<java.lang.String,java.util.List<java.lang.String>> _gBS_cache
Thread-safe map from (valid) baseName to breeding set.
private final MemoryTools.SimpleLRUMap<java.lang.String,ScorerIF> _gSI_cache
| Constructor Detail |
|---|
public ScorerPopulation(java.util.Map<java.lang.String,ScorerIF> baseScorers,
ScorerPopulation.NewBestCallbackIF callback,
int maxPopulationSize)
baseScorers - non-null, non-empty, pure-base-name seed setmaxPopulationSize - maximum number of Scorer details to retain in memory,
possibly excluding the fixed/base Scorers, strictly positive;
this value is advisory and may be slightly exceeded for performancecallback - used to report new best-of-breed being inserted into cache;
may be null
public ScorerPopulation(java.util.Map<java.lang.String,ScorerIF> baseScorers,
int maxPopulationSize,
SimpleExhibitPipelineIF vars)
This automatically posts new best-of-breed Scorers to the system variables event store as they are cached.
baseScorers - non-null, non-empty, pure-base-name seed setmaxPopulationSize - maximum number of Scorer details to retain in memory,
possibly excluding the fixed/base Scorers, strictly positive;
this value is advisory and may be slightly exceeded for performancevars - system variables pipeline to record new/best Scorer events to| Method Detail |
|---|
public void clear()
public ScoreAndConf getScorerWeighting(ScorerIF scorer,
boolean allowStale)
Older results (typically up to 1 day older) may be returned if allowStale is true; this may help the system overall ride out temporary I/O problems and disconnections.
These get operations are generally fast, though some cycle-stealing incremental housekeeping might be performed on this call.
scorer - the Scorer instance; never nullallowStale - if true then allow an older cache response
(up to VLONG period older)
java.lang.IllegalArgumentException - if the Scorer is null
public ScoreAndConf getScorerWeighting(java.lang.String scorerNameAndParameters,
boolean allowStale)
Older results (typically up to 1 day older) may be returned if allowStale is true; this may help the system overall ride out temporary I/O problems and disconnections.
These get operations are generally fast, though some cycle-stealing incremental housekeeping might be performed on this call.
scorerNameAndParameters - the name and parameters of the Scorer; never nullallowStale - if true then allow an older cache response
(up to VLONG period older)
public boolean putScorerWeighting(ScorerIF scorer,
ScoreAndConf sac,
SimpleExhibitPipelineIF dataSource,
SimpleLoggerIF log)
If the "put" Scorer is very good compared to the existing population members then it may get persisted/shared via the event mechanism.
This routine will only consider storing entries for Scorers whose base name is on the list we were constructed with.
Note that a put() operation does not lock out get() operations, so any updates on the main _scorerScores map must be incrementally valid, ie incorrect data must never be inserted into the _scorerScores map.
This will attempt to cache the actual Scorer instance provided which may save time reconstructing it later, and memory by better sharing. This instance may be recovered with getScorerInstance() later, though may have to be recreated (ie returning the same instance is never guaranteed).
scorer - the Scorer instance; never nullsac - rating of this Scorer across all exhibits; never nulldataSource - to record/persist/share good Scorers through;
null if no Scorer persistence is requiredlog - logger; never null
java.lang.IllegalArgumentException - if the Scorer is null or of an unsupported type
public boolean putScorerWeighting(java.lang.String nameAndParameters,
ScoreAndConf sac,
SimpleExhibitPipelineIF dataSource,
SimpleLoggerIF log)
If the "put" Scorer is very good compared to the existing population members then it may get persisted/shared via the event mechanism.
This routine will only consider storing entries for Scorers whose base name is on the list we were constructed with.
Note that a put() operation does not lock out get() operations, so any updates on the main _scorerScores map must be incrementally valid, ie incorrect data must never be inserted into the _scorerScores map.
nameAndParameters - the name and parameters of the scorer; never nullsac - rating of this Scorer across all exhibits; never nulldataSource - to record/persist/share good Scorers through;
null if no Scorer persistence is requiredlog - logger; never null
private Tuple.Pair<java.lang.Long,ScoreAndConf> _remove(java.lang.String nameAndParameters)
nameAndParameters - non-null valid Scorer value incluing optional parameters
private void _scavenge(java.lang.String largestSubPopulationBasename)
Currently this is done by elimination of genotypic near-clones by eliminating those Scorers that have very similar Scores and no sufficiently-distinct genes.
TODO: In future this may be done by elimination of phenotypic near clones by eliminating those Scorers that are have the same predction performance over a given calbration set.
This may not trim out anything from the specified sub-population.
This will never remove the 'best' item nor the base/parameterless Scorer.
Values removed by scavenging may be pushed into the back-stop LRU cache in case they are used again soon by the outside world; thus we may limit the amount of any scavengine we do in one go so as to avoid overflowing this back-stop cache.
This tries a series of more drastic pruning measures until it zaps some/enough Scorers, with the inital pruning passes being very quick and gentle.
This does not guarantee to remove any Scorers at all.
This must be called under the data lock.
Run time is expected to average about O(1) (peak O(n)) for a size-n sub-population.
largestSubPopulationBasename - private void _clearCaches()
public int size()
This does not include the count of any entries in the back-stop LRU (negative) cache.
public int maxSize()
public java.util.Set<java.lang.String> getCurrentScorersWithParameters(boolean allowStale)
The Scorers returned by this will generally be the best available, suitable for making a real prediction of and exhibit score, and will usually be the best one or handful per base-Scorer type, allowing that different Scorer types may have different "skill" domains, and excluding any with non-positive goodness.
This routine may be expensive, though will try to be efficient.
public java.util.List<java.lang.String> getBreedingSet(java.lang.String baseName)
This generally trims away all Scorers with non-positive values, but will always try to return the few best items regardless, ie the result is not completely empty unless the specified sub-population is.
The result will generally be large if the population is.
public ScorerIF getScorerInstance(java.lang.String nameAndParameters)
This does not attempt to canonicalise the names it is given, so may create and cache duplicates for example.
This can be used to create any valid Scorer instance, not just entries in the given population.
|
DHD Multimedia Gallery V1.50.55 | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||