Inspiration
I'm particularly captivated by the concept of text analytics, specifically the ability to extract semantic meaning from vast quantities of data through vectorization, and the subsequent utilization of this knowledge to enable machines to comprehend the content. The first time I encountered a knowledge graph demo, I was completely bewildered by the process, and the seemingly boundless possibilities it presented. Therefore, when the challenge to generate a database with a knowledge graph arose, I recognized it as the ideal opportunity to further my comprehension of knowledge graphs and Azure CosmosDB.
What it does
This application will assist cybersecurity analysts in their threat actor research by reducing the time they spent on going through all the relevant articles. It helps the analysts to easily identify and recognize the threat actor, with the comprehensive information mentioned in an article. By utilizing this application, analysts can streamline their research process and mitigate the challenges of inconsistent naming conventions used across various articles.
How we built it
The application will extract the labelled entities using OpenAI text-davinci-003 model. The model output was formatted into Vertices and Edges of the knowledge graph in hierarchical data structure in gremlin syntax. The knowledge graph was populated into the Cosmos DB using Gremlin API.
Challenges I ran into
Different python applications use different python versions and some of the modules in some versions are depreciated. It takes time to understand the problem and use the correct version in the virtual environment to make the code works.
Accomplishments that I proud of
Using OpenAI for Text Analytics, Understanding the Gremlin syntax to build a knowledge graph data model and populating the AzureCosmosDB with Gremlin API are the skills I obtained in the last six weeks of hackathon journey. Applying these skills to build the working application really makes me feel great.
What I learned
On building an application we must segment the process and work on it. When we are using the GitHub repository, we should check all the dependencies are still available in that version.
What's next for BAE SYSTEM PROBLEM STATEMENT1-THREATACTOR DATABASE
Fine-tune the model to generate an enriched threat actor database. The basic text-davinci-003 works best in extracting the entity, fine tuning this model will give more accurate insights.
Log in or sign up for Devpost to join the conversation.