SAIL LABS took part in the virtual workshop Summarization – Key to information overload

SAIL LABS took part in the virtual workshop Summarization – Key to information overload

For public services all over Europe, information exchange across language barriers is becoming increasingly important (Source: EC Europa). The Connecting Europe Facility (CEF) is a European Union fund to promote growth, jobs and competitiveness through targeted infrastructure investment at the European level (Source: EC Europa). In 2017, the European Commission initiated the CEF eTranslation Tools and Services project aiming towards contributing to the development of CEF eTranslation going beyond simple translation and acting as a “multilingualism enabler” (Source: EC Europa). eTranslation User Community regularly organizes dedicated meetings and workshops.

One of such workshops was “Summarization – Key to information overload”, where SAIL LABS’ CEO, Christoph Prinz, presented company’s practical solution for video summarization.

Information about the workshop including all the presentations can be found here.

What is a summarization?

Summary is a short text that contains the essential information of a document. People know how to produce summaries and they do it regularly, but it represents a problem for a machine. Summarizer is an algorithm that selects and presents the most important content of a document (Source: Horacio Saigon, Pompeu Fabra). Most human information is available in text form and written text accumulated through time is the most imfportant source of knowledge. Summarizations are for intelligence analysts, CEOs, politicians, reporters, students, common citizens, but there is a deluge of information available online, while human summarizations are unfeasible at a large scale (Source: Horacio Saigon, Pompeu Fabra).

Automatic (or machine) text summarization is a process of creating a shorter version out of a longer document while distilling the most important information. The aim of machine summarization is to create as good summaries as humans would do (Source: Machine learning mastery; Advances in Automatic Text Summarization; Innovative Document Summarization Techniques: Revolutionizing Knowledge Understanding)

Machine summarization represents a land of opportunities:

Why do we need automatic text summarization?

  • Summaries reduce reading time.
  • When researching documents, summaries make the selection process easier.
  • Automatic summarization improves the effectiveness of indexing.
  • Automatic summarization algorithms are less biased than human summarizers.
  • Personalized summaries are useful in question-answering systems as they provide personalized information.
  • Using automatic or semi-automatic summarization systems enables commercial abstract services to increase the number of texts they are able to process.

(Source: Machine learning mastery; Automatic Text Summarization)

Some of types of automatic text summarization:

  • Empirical approaches: a measure of relevance is determined by a word or term’s repetition, by structure of a document, fixed vocabulary, etc. Features are usually combined
  • Graph-based techniques: understanding text as a connected structure where lexical similarity or difference of sentences inform about their relevance
  • Knowledge-based approaches: using knowledge structures to interpret text
  • Data-driven/Deep Learning approaches: experimentation and training on datasets 

(Source: Horacio Saigon, Pompeu Fabra).

SAIL LABS’ CEO, Christoph Prinz, held a presentation called “Television Summarization – A Practical Example” and he spoke about SAIL LABS’ solution for audio, text and video processing. His presentation can be found here.

We are very glad to have taken part in the workshop, exchanged experience and knowledge with other professionals and we look forward to attend other conferences and workshops!