The Summary Report from the conference Fake News & other AI Challenges for the News Media in the 21st Century

The Summary Report from the conference Fake News & other AI Challenges for the News Media in the 21st Century

Introduction

The News Media are confronted with major challenges and opportunities arising from game-changing technological developments. This event showcased how AI-powered technologies could help news organizations safeguard the truthfulness and trustworthiness of their sources, stay in control of the news gathering and delivery process and empower their newsrooms, their journalists and, ultimately, their readership.

Speakers

  • Philippe Wacker, Executive Director @ LT-Innovate.org
  • Gari Owen, President @ EUROSINT Forum
  • Vincent Tripodi, VP, Engineering @ Associated Press (AP)
  • Andrew Secker, Language Technology Lead @ BBC News Labs
  • Kristof Varga, Director @ Bakamo Social Public
  • Stuart W. Shulman, Founder & CEO @ Texifter and inventor of DiscoverText
  • Gerald Czech, Head of New Media and Campaigning @ Austrian Red Cross
  • Markus Glanzer, Head of Department for European Civil Protection Mechanism and Deputy Federal Commander in Charge @ Austrian Red Cross
  • Peter Cochrane, Professor of Sentient Systems @ The University of Suffolk
  • Andreas Ventsel, Senior Researcher @ The University of Tartu
  • Vladimir Sazonov, Senior Researcher @ The University of Tartu
  • Mark Pfeiffer, Chief Visionary Officer @ SAIL LABS Technology
  • Vassilis Kappis, Lecturer in Security and Intelligence Studies @ The University of Buckingham
  • Christian Gsodam, Advisor @ Austrian Presidency of the European Council
  • Iryna Gurevych, Director @ UKP Lab in the Department of Computer Science at the Technical University Darmstadt
  • Andreas Hanselowski, Researcher @ the UKP Lab in the Department of Computer Science, Technical University Darmstadt

Presentations

Philippe Wacker

Where Language Intelligence Meets Business

In his opening speech Philippe Wacker introduced the Association of the Language Technology Industry founded in 2012. The Language Technology can be understood as consisting of three aspects: multi and cross-lingual processing (contexts where language-specific solutions are needed), interactive communication (processing and analytic operations of spoken language, automated production and analysis of voices for robots, etc.) and intelligent content (processing and analytic operations using natural language processing – NLP).

One of the aims of LT-Innovate is to promote language technologies as the driving force of not only economic profit, but also societal well-being and cultural integrity. Moreover, it attempts to both encourage collaboration within the industry and to strengthen the industry itself, while its members include Language Technology suppliers from 25 countries, users, applied researchers, integrators, etc.

Check Philippe’s presentation for more details regarding the Language Technology Industry Association.

Gari Owen

EUROSINT: Aims & Achievements

As a president of EUROSINT Forum, which was established in 2006, Gari Owen introduced the goals and activities of this independent and not-for-profit association. One of the goals of EUROSINT is to create a European intelligence ecology dedicated to promoting OSINT for preventing risks and threats to peace and security. The EUROSINT Forum also attempts to bring together specialists and users and contribute to better communication and promote thinking regarding the development of the European Union’s policies concerning the usage of OSINT in security spheres.

Nowadays the EUROSINT Forum has achieved to build a network of more than 500 contacts, organized 50 seminars of interest to the OSINT community and was involved in various projects.

Check Gari’s presentation for more details regarding EUROSINT Forum.

Vincent Tripodi

Newsroom Technology in 2020 – The Singularity is Not Near

Vincent Tripodi noted that there are three categories of fake news related challenges which drive conversations at Associated Press (AP): fact-checking, content validation and automated content creation. Vincent mentioned that the manual fact-checking consisting of searches for quotes and previous articles is cumbersome and slow and that the AP uses various manual tools which are not interconnected.

Vincent offered a real-world example of a fake story which would be much easier and quicker to recognize in case there was a social media recognition or a database. He mentioned that journalists spend more time validating content than telling the story itself. Afterwards Vincent talked about the steps that AP takes when validating user-generated content (UGC), specifically a video: they assess a contributor credibility, check metadata as evidence and review content originality and accuracy.

AP began the development of AP Verify with joint funding from the Google Digital News Initiative in 2018. The vision of AP Verify is to automatize parts of the fact-checking process that deal with the identification of source (finding the earliest version of the information, assessing the credibility of the source) and verifying content (whether the location, date and time is correct and if the material is consistent with other media). The design of the cloud-based newsroom tool allows to take advantage of machine learning technologies to help with identification of the original source of UGC and its probability of being real.

The aim of AP Verify is to help AP and its customers to verify UGC eye-witness media as quickly as possible with the goal of extending its capabilities to include the verification of all online content. AP plans to make AP Verify available globally to all newsrooms, which will not only save time of journalists, but also increase use and trust of eyewitness media and Truth Velocity. AP Verify is currently in development and should be able to integrate easily with multiple other platforms and APIs. The first version available for AP and pilot customers could be available in the first quarter of 2019.

Check Vincent’s presentation for more details concerning the Newsroom technology.

Andrew Secker

SUMMA: Enabling Journalists Through Multilingual Media Monitoring

According to Andrew Secker, BBC Monitoring (a division of the BBC which monitors and reports on mass media worldwide) enables the BBC and its partners to monitor who is saying what in more than 150 countries, in 100 languages and to understand why. BBC Monitoring consists of 13,580 distinct sources, out of which approximately 1,500 are television sources and 1,350 are radio sources. As Andrew noted, if journalists would like to fact-check them, it would be time consuming as current monitoring processes are largely manual.

Andrew introduced the project Scalable Understanding of Multilingual Media (The SUMMA Project) which aims towards improving media monitoring by the automatic analysis of media streams across many languages. Consequently, the project enables media monitoring at scale with the usage of language technologies. SUMMA concludes in three steps: 1) Ingest (live video segmentation, Automatic Speech Recognition in nine languages), 2) Translate (Machine Translation to English from nine languages), 3) Enrich (story clustering, cluster and story summarization, named entity extraction, topic detection, knowledge base population and automated fact-checking). Andrew proceeded with showing the interface of SUMMA and underlying that SUMMA is available as an open-source project.

Check Andrew’s presentation for more details regarding the SUMMA project:

Kristof Varga

Beyond Identifying, Tracking, Blocking and Debunking

The director of Bakamo.Social noted that disinformation comes out of cyber environment and enters reality, where it has impact on minds and hearts of people. He proceeded with the example from the study of the media coverage of French elections in 2017, where he explained three types of sharing behavior when it comes to the usage of social media: repeat, mission and provoke. Repeat refers to the behavior when a user shares the URL link to the article, although without providing any personal comment or interpretation. Mission is a second distinct behavior which occurs when a user provides a personal comment to frame the article. Provoke represents a third type of behavior occurring when articles are shared with the aim to hurt and humiliate users with different views than the one held by the user. Kristof mentioned that the study suggested that the first type of behavior (Repeat) was the most widespread type of sharing information online with regards to French elections. The study pointed out to three possible reasons why do people share content: they either seek community, identity or self-worth and competence.

According to Kristof, AI can help with human tasks that would take people too long time to complete, like, for instance, pointing out dominant themes in the media. His experience with AI shows that it is very efficient once taught, with the ability to categorize the whole dataset. Other benefits include the availability of sentiment analysis, the possibility to use different categories and that the findings are supported by quantitative data. When it comes to the disadvantages, Kristof mentioned that AI needs data to be trained on, the fact that sentiment analysis does not inform about the commenter (it provides information regarding the tone of the comment), the lack of information concerning the construction of the categories and he also noted that each project is different and deals with complex social issues.

Check Kristof’s presentation for more details concerning Bakamo.Social:

Stuart Shulman

Humans & Machines Learning Together

The Founder and CEO of Texifter started the presentation with an example of fact-checking President Trump’s interview in The Washington Post and pointed out to a wide spectrum of approaches and methods when it comes to text analysis. Stuart compared text analytics to filtering and suggested a question: “How do you know if a human is right, and how do you know that computer was trained by humans who are right?”

He introduced Coding Analysis Toolkit (CAT), which is a free and open source text analysis service hosted by Texifter. CAT provides the possibility to code raw text data sets, annotate the coding, measure inter-rater reliability, etc. Stuart noted that the text classification is a 2,500-year-old problem and particularly scholars have a problem with the volume of the data. When it comes to both humans and machines, validation is vital, while it needs to be borne in mind that automated models do not replace humans, but they enhance their abilities.

Stuart highlighted the benefits of crowdsourcing when it comes to process of text analysis. Moreover, crowdsourcing is also essential to another Texifter’s software, DiscoverText, which provides cloud-based software tools to assess large amounts of unstructured textual data and has multilingual, text mining, data science, human coding, annotation and machine-learning features. The CoderRank software, for which Stuart was awarded a patent in collaboration with Mark J. Hoy, represents a unique way to rank humans on trust and knowledge vectors.

To sum up, CAT and DiscoverText can contribute to the study of Fake News and AI with a free and open source software option (CAT) introduced above, other web-based crowd source collaborative tools, the innovation in measurement, free Twitter data collection, random sampling, keystroke coding, various options of advanced search and filtering, clustering algorithms and machine-learning classifiers, etc.

Check Stuart’s presentation for more details concerning Texifter.

Gerald Czech and Markus Glanzer

Fact or Fake? Who cares?

Gerald and Markus from the Austrian Red Cross provided a different perspective of the fake news problem, as both represent end users without developers’ background. They suggested that nowadays, with the huge volume of news, people could be overnewsed but underinformed at the same time. As the mission of the Red Cross around the world is “To improve the lives of vulnerable people by mobilizing the power of humanity”, today there are also other approaches of how to bring the humanity to people.

With regards to social media, Gerald does not see it as another channel of communication, but rather as a symptom of radical modern change in communication. Thomas Theorem, which states that if men define situations as real, they are real in their consequences, was mentioned in the context of communication. As there is no hierarchy, the structures are changing. The idea of “Schroedinger’s Digital Cat” was brought up, as by gathering information we change the current situation.

And how is the information above connected to the Red Cross and the fake news phenomenon? Markus noted that fake news has three different layers of impact: strategic, tactical and operational. The question relevant for the Red Cross is – Can we trust in information? He proceeded with showing participants the interface of their own systems and mentioned that the Red Cross also uses social media data. The problem of fake news was illustrated through fake news examples attacking the international humanitarian movement’s work. For instance, one of these fake news stories accused the Red Cross of stealing money which were supposed to be collected for Haitians along with Bill and Hillary Clinton. Another example was a fake video which went viral and showed trucks containing large amounts of cash in the Red Cross boxes.

Check Gerald’s and Markus’ presentation for more details regarding the Red Cross.

Peter Cochrane

How to Build a Truth Engine

The Professor of Sentient Systems at the University of Suffolk held a remote presentation through Skype, where he problematized threats of fake news and possibilities to combat it. Peter mentioned that the truth can be dynamic and both certain and uncertain, therefore it can be understood as a dynamic binary. Moreover, truths are hard won because they demand deep thought, energy, freedom of thought, education, the ability to reason and the acceptance of debate. On the other hand, accepting the ridiculous takes far less thought, effort and energy.

As mentioned above, lies and fake news are dangerous because they do not require thinking, are easy to grasp, play to dispositions and reinforce prejudices. Furthermore, they are growing in volume and are able to be spread faster, further and deeper than validated information. The danger of fake news lies in the fact that there is no place for them, as government, media, society, industries, institutions, defense, technology and engineering need to be built on the grounds of science, discovery and fundamental truths.

Peter noted that everything and everyone is connected: “In networked and fast-moving world simple minded thinking, approximations, ignorance and lies are dangerous – they destroy companies and countries.”

When it comes to countermeasures, the same technologies or techniques should be used to counter fake news, although fully automated and applied on a global scale. There is a need to continually scan the internet for purge errors and liars, criminals, AI lie generators and persistent offenders. Another important need is to publish veracity ratings and to attenuate rogue users or states and extreme channels. These steps would enable the construction of a truth engine. The question is: Will we be able to devise a lie recognition using Bayesian inference, simple AI, correlations and trending analysis?

According to Peter, “Truth is vital for our planet and the survival of all life forms including us.” What we would need to build such a truth engine is: the meaning extractor for context (Inference Engine), the primary source identification (fact checkers), the information regarding style, such as behavior, history, publications analysis and a trending analysis for the veracity rating. The core questions we need to ask when assessing veracity, accuracy and credibility testing are the ones concerning the author and their reputation. Who do they work for, is the content sponsored and do they cite trusted sources? The following question is: Could all this be automated?

Afterwards Peter problematized needed steps and identified the obstacles. One of the requirements of the truth engine would be a library of lies. This is a comprehensive construct of all recorded lies over time, which would be needed as a reference source across human and AI information sources. The library of lies does not yet exist and would require a coordinated automated global effort. Another follow-up question is whether it is possible to rank entities’ truthfulness and accuracy based on validated facts by automating the process in order to create a confidence rating or uncertainty percentage.

Furthermore, Peter noted that facts and truths sometimes do not matter, which could be due to “illusory truth effect”. This effect occurs when falsehoods are repeated so often that people start to equate them with the truth. He elaborated on the estimates of truth quality of various newspapers and added that the automation of these processes will not be easy, but it seems possible to equate. The most plausible technology to achieve this is AI. To sum up, Peter noted that many components of the truth engine do already exist or could be built. Although, it is essential to build multiple truth engines with different configurations, components and AI.

Check Peter’s presentation for more details regarding the Truth Engine.

Andreas Ventsel

Affective Communication in Infowarfare

 

Andreas Ventsel, the senior researcher at the University of Tartu, began with introducing post-truth rhetoric, information overload and information fatigue. Post-truth news culture can be characterized by appeals to emotions rather than knowledge-based information and arguments. Secondly, post-truth rhetoric summarizes strategies of gaining visibility through promoting certain framing in order to persuade the audience. Finally, information fatigue or information fog refers to a type of reaction to information overload.

Social media can be particularly suitable for affective communication. This type of communication can be understood as a cognitive atmosphere of short messages eliciting emotional identification. As a result, it is based on sharing immediate reactions and emotions. The social media messages often do not represent new information, but consist of fragments of known information and can anchor topics in the society with personal associations or emotions.

Images can be especially powerful, as they are able to evoke affective reactions and make contact with human non-conscious meaning-generating apparatus. When it comes to abstract images which distort our spatial perception, this kind of visual stimuli can attract the attention and therefore help spreading the content, even though it can contain misinformation. Visual manipulation occurs on the level of reference and iconic images are understood as representations of emotional identification.

Specifically, as messages which include both verbal and pictorial information could guide the recipients’ interpretation, they can easily give rise to misinformation, which is designed to meet emotional needs, reinforce our beliefs, etc. Afterwards, the iconic images mentioned above can function as triggers of interpretative frames or connotations. Andreas suggested four ways to combat misinformation spreading: cultural memory research, semiotic approaches, psychological research and the Internet research.

Check Andreas’ presentation for more details concerning the affective communication.

Vladimir Sazonov

Russian information operations against Ukraine in 2014-2015: tools and techniques

Vladimir Sazonov, the senior researcher at the University of Tartu and a leader of the project “Russian Information Campaign Against the Ukrainian State and Defence Forces” prepared by the NATO Strategic Communications Centre of Excellence, presented a brief overview of Russian tools and techniques applied in Ukraine. Vladimir mostly presented results based on the analysis of interviews with different Ukrainian experts.

According to Vladimir’s data, Russia started to prepare for possible military conflicts with Ukraine a few years ago. He divided the phases into “a preparatory phase”, “informational sounding (exploring) of situation”, “creation of informational lodgment”, “a phase of informational aggression”, “rocking the situation on Donbass” and finally “wide pressure of information”. He noted that some of the tools and techniques included the creation of fake homepages and portals by Pro-Russian insurgents of the Donetsk and Luhansk Republics, cyber-attacks by Russia and separatist republics, etc. Vladimir stressed the role of separatist mass media in information war and mentioned that Russian intelligence operations are situational in their nature.

Check Vladimir’s presentation for more details regarding the Russian information operations.

Mark Pfeiffer

OSINT & AI to Support the Analyst in the Ever More Digital Age

Mark Pfeiffer, the Chief Visionary Officer of SAIL LABS Technology, introduced the Media Mining System and how it can be used to identify and combat fake news. The company itself is a global leading provider of Open Source Intelligence (OSINT) systems, Automatic Speech Recognition, media monitoring and analysis.

Mark proceeded with introducing the term open source intelligence and showed the processing roadmap of the Media Mining System, which covers 32 languages. The crossmedia, multilingual and customizable system can process inputs in various data forms, such as TV, Radio, Feeds, Blogs, Youtube, Flickr and social media platforms like Twitter, Facebook, Instagram, etc. The Media Mining Feeder is responsible for converting these formats to images, video, audio or text. The system includes cleaning and normalization of data and visual and audio processing tasks such as segmentation, automatic speech recognition, the identification of language and speaker ID, etc. When it comes to text processing, the Media Mining Indexer is able to execute the named entity detection, story segmentation, topic detection, translation and instantly provide users with the results of various sentiment analyses regarding the object of their interest.

After introducing the SAIL LABS’ Media Mining System, Mark performed a live demo. The Ukraine political developments in 2013 and 2014 together with the Crimea Crisis were chosen as examples of how OSINT can be used to provide actionable insights. As demonstrated, the system is able to display data coming from traditional media, internet articles, satellite tv, social media, etc. and provide accurate results of various sentiment analyses to get the user immediately informed about the situation in real time.

Concerning the fight against fake news, the first source of any information spreading on the Internet can be quickly identified with SAIL LABS’ Clustering feature. Moreover, combined with the option of displaying the results of various sentiment analyses, this can be a valuable tool for fighting fake news as it can identify polarizing narratives. Other essential feature that could help to detect disinformation online is the Bot Factor, that is based on an algorithm developed by SAIL LABS and it determines the probability of a Twitter user being a bot. Another vital element in combating fake news includes the Photoshop Detector, which informs users that a certain picture was manipulated based on the associated meta data.

To conclude, Mark suggested that humans will continue to play a vital role with regards to AI. He noted that SAIL LABS systems support the processing of the growing amount of data that appears in various forms and languages.

Check Mark’s presentation for more details regarding the OSINT and AI.

Vassilis Kappis

Media and Security Crises

The lecturer in Security and Intelligence Studies at the University of Buckingham offered possible definitions of a security crisis. The crises are important because they usually begin with a conflict of interest or disputes and could escalate to armed conflicts in the future. Crises could be defined as situations where stress and complex circumstances are involved, therefore the decisions are likely to be affected by psychological predispositions of individuals. He mentioned that rationality could be compromised in times of crises and that policy-makers may be more prone to not process information contradicting their previously held beliefs.

Vassilis introduced the CNN Effect, which he regards as the publication of shocking material of humanitarian crises around the world that compels U.S. policy makers to intervene in humanitarian situations, where they would otherwise have no interest. He illustrated a battle of narratives on the example of Georgia. On the one hand, there is a Georgian narrative understanding Georgia as a young, liberal democracy which aspires to join the European Union and NATO and has to counter terrorist threats originating in secessionists regions. On the other hand, there is a Russian narrative about neoconservatives encroaching on their neighbors, while there are Russian citizens to be protected. Vassilis noted that the crisis in Crimea can be regarded as an example of a new kind of war using pervasive disinformation.

An approach to conflicts adapting simultaneously various instruments of war including mass political manipulation as an information warfare could be called hybrid, multi-dimensional, new generation, irregular, etc. The new warfare is different from traditional methods in various aspects, such as the timing of a military action, the types of clashes, the aim of the operation, etc. For instance, the military action in a new warfare starts during the peace time, not after a declaration of war, which was typical for traditional methods. Moreover, there is a shift in focus from direct destruction to direct influence and an emphasis on psychological or information warfare including a combination of political, economic, information, technological and ecological campaigns.

To sum up, Vassilis suggested that the dynamics of security crises can be affected through the perceptions of decision-makers. As a result, the information provided to decision makers should be filtered during security crises, as it is impossible to prevent fake news or propaganda completely.

Check Vassilis’ presentation for more details concerning the media and security crises.

Christian Gsodam

Are Fake News Threatening our Democracy?

The Advisor of the Austrian Presidency of the European Council suggested that we need a strong and a public action platform. Christian noted that the Brexit referendum, German elections and the rising pressure in certain regions are accompanied by massive attacks from actors abroad, which is why countering fake news and disinformation is very important for the European Union (EU).

A considerable amount of political energy was invested to make the EU more democratic. As an example of the democratic processes in the EU, Christian explained that since the Treaty of Lisbon, which entered into force in 2009, every piece of the EU legislation is voted by the European parliament that is elected by voters in member states. Moreover, the Council of the European Union is controlled by national parliaments, as it consists of ministers of member states. Despite these democratic processes, critics use to refer to the EU as a technocracy or bureaucracy.

Christian noted that it is difficult for the EU to communicate with its citizens, as the member states include various countries and many different languages. Moreover, disinformation can spread easily. Therefore, he suggests to better link the institutions with citizens. We need to empower our people with knowledge and discuss with them openly and fairly. Eventually, Christian mentioned that the EU parliament elections in 2019 will be a test of the European democracy.

Iryna Gurevych and Andreas Hanselowski

Natural Language Processing for Automated Fact-Checking

According to Iryna Gurevych, the Professor at the Technical University Darmstadt and the director of the Ubiquitous Knowledge Processing (UKP) Lab, false information could be distributed increasingly fast through social media. Moreover, the manual fact-checking is labor intensive and many false claims still reach the public. In spite of the fact that fully automated fact-checking is not possible to perform, AI can assist fact-checkers to accelerate and therefore improve the fact-checking process.

Iryna named a few of the major challenges of automated fact-checking: a huge amount of training data that is required, as AI methods are data hungry and the fact that there exist various domains which serve as sources, for instance the social media, hyperpartisan news, etc. Furthermore, effective machine learning methods should have reasonably high performance for the end users, should be generalizing across various domains and should be transparent to be able to help the fact-checkers.

Iryna proceeded with the introduction of Snopes, which is a Richly Annotated Corpus for Different Tasks within Automated Fact-Checking. The main idea of Snopes is to generate fine-grained annotations for each validated claim and to be able to incorporate a large collection of web-documents to be used as an evidence to support analysts performing fact-checking. According to Iryna, UKP Lab developed a cost-efficient approach of constructing large corpora for fact-checking using crowdsourcing, annotation guidelines, a user interface, etc.

Andreas Hanselowski, researcher at the UKP Lab, introduced UKP Lab’s participation at the First Workshop on Fact Extraction and Verification (FEVER) 2018. The problem setting of FEVER was to retrieve the document, extract the evidence and claim the validation. The evaluation metric of success was the prediction of the correct claim label: whether the piece of information is supported, refuted or there is not enough information. UKP Lab’s solution scored as the third most successful when predicting the label out of 23 competing teams.

To conclude, Snopes corpus represents a heterogeneous multi domain corpus rich in information with high inter-annotator agreement. UKP Lab’s system for automated fact-checking based on the FEVER shared task can be regarded as a high performing document retrieval or entity linking tool that is able to retrieve articles from Wikipedia on the basis of a given claim. Iryna, Andreas and their team have developed a model for the validation of claims. Their approach to transparent automated fact-checking consists of document retrieval, evidence extraction, stance detection and claim validation.

Check Iryna’s and Andreas’ presentation for more details regarding Natural Language Processing for Automated Fact-Checking.

Conclusions

To conclude, please find below a few sentences providing a brief summary of the present document.

  • With regards to the current state of the intersection of AI and news organizations, the AI could be used as a support for both the journalists and the public
  • Various speakers stressed the importance of crowdsourcing when it comes to AI and text analysis
  • The manual fact-checking is labor intensive and AI can assist the fact-checkers, who can then focus more on the stories than on the validation of the information
  • The general structure of social media platforms is prone to fake news spreading
  • Multilingualism, cross media monitoring and OSINT can help researchers and journalists save time when it comes to fact-checking
  • Conferences such as the Fake News and other AI Challenges for the News Media in the 21st Century are needed for bringing the experts from various fields together to share ideas and compare different perspectives on the possibilities how AI can support both readers and journalists in uncovering the truth

Share The Summary Report on Social Media

Share on linkedin
Share on twitter
Share on google
Close Menu