Knowledge Discovery

Data-driven knowledge discovery forms an integral, bottom-up part of the Neoclassica research framework, identifying by means of algorithms in statistical analysis and machine learning cultural patterns such as constructional or ornamental features. It bears in particular the potential to uncover hitherto unknown patterns in the source data and assist in challenging preconceived notions of the dissemination of forms and the evolution of stylistic patterns.

Currently we are experimenting with several algorithms that allow for the classification of objects and of their features in images. Among them are Deep learning approaches both with Convolutional Neural Networks (CNNs) and Regional Convolution Neural Networks (RCNNs). The current hypotheses is that while CNNs may be perfectly suitable for classifying objects, RCNNs will provide a better foundation for classifying multiple objects within one image as well as features of an object. In the future we strive to combine this analysis with semantic technologies for providing a truely multimodal analysis.

The major impact of automating the classifications of forms and artefacts for researching an curating objects will be:

  • to shift  the focus from conspicuous pieces of art to a broader perspective on art as material culture;
  • provision a method to explore the vast amount of underdocumented objects by relating them to each other on the basis of formal features;
  • provision a method to analyse existing corpora (Catalogues Raisonnés, Museum Collections) to better understand their shape as a forming of cultural memory (particularly if combined with multimodal analysis of the text and the visual).

Building a knowledge discovery module for any science is a practice constantly faced with the issue of accounting both for intellectual propery rights and openness. In order to provide a rich-yet-open corpus we decided to choose a multilayered approach.

This is an ongoing activity for which we collate a corpus from public domain sources including but not limited to the Metropolitan Museum of Art, New York, having recently opened a considerable amount of 375.000 images for public use as well as of digitized and freely available period source books. These images are augmented according to the needs of the particular algorithm that is applied to them.
The data for each of these experiments as well as trained classifiers (in the case of Deep Learning experiments) will be made publicly available.
Once completed the whole Neoclassica-open corpus will also be realesed as open data.
To not alone rely on the relatively arbitrary-in-nature artefacts accumulated by museums or galleries we also strive to digitize and analyse a multimodal corpus from a major historical ensemble. For this we have a partnership for acquiring funding to digitize the interiors and furnishings of the Dessau-Wörlitz UNESCO World Heritage Site.
We conceptualize the process of corpus annotation to be in a constant dialog with the ontology development, so that the findings of domain experts curating the data will enrich the perception for the overall domain.  The concepts applied during this annotation process will be directly taken from the Neoclassica ontology and describe concepts for types of artifacts.