Iterative Characterization

We can distinguish knowing something from knowing what we know about something. The latter tells us how we might access the former. We can characterize a body of knowledge to varying degrees and incrementally improve that characterization. Our method employs mechanically separating representations into those that we recognize in some detail from those we understand only superficially.

# Method

We decompose a representation by applying a network of classifiers of varying precision. We prioritize classifier application favoring the more precise but always with a fallback. This network becomes our model to be iteratively improved.

With each iterative step, choose a category of interest, distinguish members of that category by adding a more precise classifier, repeat until interest wanes.

A sense of progress is maintained by counting the members of fallback categories and working those numbers down to insignificance.

# Performance

Successful application of this method depends on several performance characteristics.

Categories must be easily viewed.

Classifiers must be easily added.

Iterations must be fast to maintain interest.

# Examples

Iterative characterization by computer becomes valuable when a dataset is large and of interest beyond its original design. The activity is more exploring than solving though goals regularly surface.

PEG parsing of Wikipedia. Parsing expression grammars are contextually specific about lexical forms and offer prioritized alternatives through backtracking. Exploratory Parsing provides instrumentation for category viewing while category sampling speeds iteration.

Structure Warehousing. The transform step in ETL classifies elements as related or not while graph database queries present categories on multiple dimensions. El Dorado extends the method over context boundaries as often found in large enterprise.