Due to the lack of multilingual and multimedia extremist collections and advanced analytical methodologies, our empirical understanding of the Internet or dark web is still very limited. Content mining and intelligence inside the Internet becomes more and more a challenge to different bodies from security, financial organizations (e.g. financial intelligence units “FIU”) and law enforcement agencies. Tracking of large digital information from various sources like public Internet, dark web, long tail web or blogs and other social networks creates new challenges to the research community. A first test is to create intelligent crawlers that can identify any link in the web and extract the digital footprint from the web page. Some of the key challenge that we face are in the area of automatic multilingual text analysis, the harmonization of extracted knowledge and the unique identity resolution. Taxonomies and thesauruses do not offer a complete solution for the automatic discovery of hidden relations or newly defined expressions for named entities. In order to understand shadow groups we need to apply advanced technologies from artificial intelligence and computation linguistics. In this paper we will share our experience which we developed from various projects in Europe, Russia and Central Asia. We will discuss how an ontology-driven information extraction approach from large multilingual document collections can help to create an understanding and therefore valuable knowledge. We will further demonstrate how to solve the merging of various ontologies used for different domains and languages using the concept of upper ontology and conclude the discussion by sharing insights on how to create rules for automatic identity resolution for specific named entities.