Quaderni di Parlaritaliano

LOGIN

Tags Cloud

SpLaSH - Spoken Language Search Hawk PDF Print E-mail
There are no translations available.

SpLaSH (Spoken Language Search Hawk), a toolkit used to perform complex queries on spoken language corpora. In SpLaSH, tools for the integration of time aligned annotations (TMA), by means of annotation graphs, with text aligned ones (TXA), by means of generic XML files, are provided. SpLaSH imposes a very limited number of constraints to the data model design, allowing the integration of annotations developed separately within the same dataset and without any relative dependency. It also provides a GUI allowing three types of queries: simple query on TXA or TMA structures, sequence query on TMA structure and cross query on both TXA and TMA integrated structures. Compared to other corpora management systems presently available, SpLaSH presents interesting innovations and higher performances. SpLaSH does not impose any fixed hierarchy to annotation levels in addition to those implicitly defined in the data model, as it is based on the idea that each level could be obtained, in principle, independently from the others. In this way different working groups can generate their labels minimizing the number of constraints required to successively integrate their work in an unified framework. Presently SpLaSH is going to be enriched with a query language named SpLaSH-QL, for which the formal definition is being ultimated. SpLaSH-QL is formed by a set of specific algebraic operators finalized to the retrieval of information from TXA and TMA integrated dataset, operators can be composed each other and can explicitly contain XPath query recall as arguments. In next SpLaSH releases, guided interface for querying will be based on this language and its use will increase the potential query expression power of the system. SpLaSH is an open source project.

SpLaSH is available here

 
 

Search