Open Source Signal Discovery

It's out there. Virtually everywhere. In volumes until recently never even imagined. It, of course, is open source data — all the information embedded in the Internet, social media, and more. And the ability to capture and exploit the meaningful parts of it is a crucial addition to today's defense and intelligence tradecraft.

OSGS helps accomplish this by leveraging our extensive commercial experience in machine learning and predictive analytics to identify and analyze what matters in open source data. We're currently doing so, in fact, under a contract organized along the Department of Defense and Intelligence Community's Tasking, Collections, Processing, Exploitation, and Dissemination (TCPED) framework:

We extract content from virtually all sources — unstructured text, audio, video, and others — on a continuous basis at appropriate stages, from real time to periodic intervals, daily or weekly, depending on the source and the task's requirements.

Our capabilities include:
Supporting multi-language analysis
We use our patented 87+ million and growing ontology of terms, effectively covering more than 130 languages, to relate terms within and across languages to create the proper structure for processing. Seeing "Libya" in a document, for example, we can easily identify alternate language spellings, such as French (Libye) and German (Libyen), in other documents. This allows us to derive concepts from the document, which is a critical element to organize the data for processing.

That, in turn, lets us determine the relationship between the people, places, and things referenced; how context influences the author and those referenced; and if the sentiment expressed is consistent, or an anomalous change meriting further examination.

Representing context
Our geo-political ontology is captured dynamically as content is parsed and added to the system from both open and restricted sources. Then we represent and modify it for analysis and use in real time.

As an example, the graphic above shows a portion of our geo-political context devoted to OPEC. The clusters are the countries in OPEC; the spirals around them represent various leadership positions within each government, as well as connections to other organizations such as the G20 or the African Union. Additional visualization detail would include the individuals in specific government positions, as well as additional geo-political connections.

Next is a closer look at the OPEC portion of the graph showing some of Saudi Arabia's context within our system.
At this stage we apply our machine learning and predictive analytic techniques to identify information of interest that takes into account the direct and indirect factors that a human analyst would apply if he or she could process the volume of open source data being created today. Our exploitation capability is used to supplement a team of human analysts by alerting them to the most interesting information based on their directed navigation of our system.

Applying predictive analytics
We use predictive analytics to capture the tradecraft of human analysts and apply machine learning algorithms to the real-time flow of social media and Internet data. The expertise we have gained building analytic systems for the Global Fortune 500 and various Government entities allows us to develop the specific algorithms necessary to determine the Interest Score — the key element to our open source exploitation capability.

The core of our predictive analytic exploitation, our Interest Score is comprised of multiple elements — a typical example of our ensemble approach to solving analytic problems. Those elements include:
  • sentiment score
  • threat score
  • context score
  • context anomaly score
  • influence score
Each score represents an application of complex analytic techniques, based on statistical analysis and statistical validation of the results. We also take the added step of explaining how we arrived at the Interest Score to provide analysts with a "reason code" that allows them to apply their judgment to results. This also begins a formal feedback loop from the human analyst to our solution and its support staff that enhances the machine learning feature of our solutions.