Querying the SearchSECO database

Overview

The Query functionality enables you to search the SearchSECO database for information about GitHub repositories. By entering the URL of a specific GitHub repository, you can query the database for relevant data.

Querying Guide

Follow these steps to use the Query feature:

Enter query fields

Navigate to the Query page (opens in a new tab) and fill in the fields.

  1. In the Query form, find the field labeled "GitHub URL". Here, enter the URL of the repository you want to query. Make sure it's a valid GitHub URL, or you'll receive an error.

  2. Below the URL field, there's another field labeled "Branch". Input the branch you want to query here. If it's the branch doesn't exist, an error message will be shown.

Query page

Submit the query

Click the "Submit" button to initiate the querying process. The application first validates your URL and GitHub Access Token. If no validation errors occur, the query process begins and you'll see a "Querying SearchSECO database..." message.

Pay for Hashes

Once the query is complete, you'll receive information about the cost and number of hashes found in the "Results" section. If you decide to continue, press the "Pay for hashes" button to pay for the session. Following this, a transaction request is sent to your wallet for approval.

Query page pay

The total cost is calculated as the hash cost multiplied by the amount of hashes in the project.

For an overview of SECOIN, what it is, and how you can swap it for other monetary tokens, have a look at the SECOIN page.

Review Results

After the payment confirmation, the data will be fetched from the SearchSECO database. You can inspect these results in the "Results" section. If you want to download the result, click the "Download as JSON" button.

Cancel Session

If you want to cancel the session at any point, simply click the "Cancel" button. Be aware that this will permanently delete the current session and any data received. So, ensure to download your data before cancelling the session if you need it.

Query Results

The output of the application's mining and parsing operation is a JSON object. This object contains important information about the code repositories that have been analyzed. It consists of three main parts: methodData, authorData, and projectData.

Method Data

The methodData array contains objects that represent each individual method that has been analyzed within a project. Here's an overview of the data it includes:

  • method_hash: A unique identifier for the method.
  • projectID: The identifier of the project where the method was found.
  • startVersion & startVersionHash: The timestamp and commit hash of the version where the method was first identified.
  • endVersion & endVersionHash: The timestamp and commit hash of the version where the method was last identified.
  • method_name: The name of the method.
  • file: The file in which the method was found.
  • lineNumber: The line number in the file where the method starts.
  • parserVersion: The version of the parser used.
  • vulnCode: Any identified vulnerability code.
  • authorTotal: The total number of authors who contributed to the method.
  • authorIds: An array containing the IDs of the authors who contributed to the method.

Author Data

The authorData array contains objects representing each unique author who has contributed to the analyzed methods. Currently, it contains:

  • username: The username of the author.

Project Data

The projectData array contains objects that represent each unique project that has been analyzed. Here's an overview of the data it includes:

  • id: A unique identifier for the project.
  • versionTime & versionHash: The timestamp and commit hash of the project version.
  • license: The license of the project.
  • name: The name of the project.
  • url: The URL of the project.
  • authorName: The unique identifier of the author of the project.
  • authorMail: The mail id associated with the author of the project.

Why query SearchSECO?

Querying the SearchSECO database is the counterpart to running a Mining node. As such, it provides a way to spend your SECOIN to gain insight about software evolution and quality.

The insights gained can be used to better the global software ecosystem:

  • Software Maintenance: Information about the software evolution, such as changes in methods, can aid in predicting future software maintenance activities. Knowledge about who changed what, when and how can facilitate better management and coordination of development efforts, especially in large, distributed teams.
  • Quality Assessment: By evaluating the various aspects of code evolution, it's possible to identify software quality metrics. Information like method complexity, frequency of change, and contributor details can help assess the overall health of a project.
  • Vulnerability Detection: The SearchSECO database also captures data about vulnerable code snippets. Querying this data can provide insights about potential vulnerabilities in the system, enabling proactive remediation efforts.
  • Historical Analysis: The history of changes made to a method, captured as different versions, offers a detailed understanding of how the software has evolved over time.
  • Contributor Analysis: Information about the contributors to a method or project can provide insights about the development practices followed by the team.

While mining nodes contribute to the growth of the database by extracting and parsing data from various repositories, querying lets you extract and utilize that data to gain valuable insights.