Skip to main content
News Directory 3
  • Business
  • Entertainment
  • Health
  • News
  • Sports
  • Tech
  • World
Menu
  • Business
  • Entertainment
  • Health
  • News
  • Sports
  • Tech
  • World

DNA Database Searchable: Largest Ever Created

September 17, 2025 Lisa Park Tech
News Context
At a glance
  • Here's a ⁢breakdown of the provided ⁤text, focusing on⁣ Prashant Pandey's research and the problem he's addressing:
  • * Vast, Untapped⁣ Data: The Sequence Read Archive ⁣(SRA) contains a ⁢massive amount of raw DNA sequencing data (petabytes).
  • * Building⁢ a Search Index: Pandey's team is developing a system to create ⁣an index for the SRA data.
Original source: news.northeastern.edu

Here’s a ⁢breakdown of the provided ⁤text, focusing on⁣ Prashant Pandey’s research and the problem he’s addressing:

The⁤ Problem:

* Vast, Untapped⁣ Data: The Sequence Read Archive ⁣(SRA) contains a ⁢massive amount of raw DNA sequencing data (petabytes). This data holds valuable insights, but it’s currently⁣ arduous⁣ to search effectively.
* Searchability Gap: Assembled genomes (finished DNA sequences) are easily searchable, but the raw, fragmented “reads” within the SRA are not.
* Need ⁤for Retrospective Analysis: Scientists often want to know if a⁤ newly discovered genetic sequence (like a virus ⁤or bacterium)⁣ has appeared in previous experiments stored in the SRA.⁣ This requires searching through the⁢ raw data.
*⁣ Transcript Search: The key is being ⁤able to search for longer genetic sequences (transcripts) within the ‍millions of⁣ short reads ⁤in the SRA.

Prashant Pandey’s Solution:

* Building⁢ a Search Index: Pandey’s team is developing a system to create ⁣an index for the SRA data.
* K-grams ⁢& ‍Embeddings: They convert short reads into small sequences called “K-grams”⁢ and then map these into a “high-dimensional embedding” -⁤ essentially creating a digital fingerprint for⁢ each experiment.
* Digital Fingerprinting: ⁤These fingerprints⁤ are stored in an index, ⁣allowing for faster and ⁤more efficient ⁢searches.
* Query⁢ Comparison: When a new genetic sequence (transcript) is entered, its fingerprint is generated and compared‍ to the index to quickly identify potential⁤ matches within the SRA data.

Key Quotes from Pandey:

* “We have this treasure trove, this‍ amazing and really insightful resource, which is just sitting around. We need the ability to ⁢search the⁤ raw sequencing data,⁣ all of‍ it, at the petabyte scale.”
* “This requires innovating at all the levels of the stack, starting from new approximate indexing techniques, approximate data structures, building systems ‍that can scale out in a distributed surroundings, hosting the whole thing ⁤in the cloud and ⁣making it ⁤publicly available for ⁤anyone to search.”

In essence,Pandey is working to unlock the ⁢potential of the SRA by making ⁣its vast amount of raw data searchable,enabling scientists to make new discoveries ‍and understand the genetic landscape more comprehensively.

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X

Related

Archives, computer science, Data science, data storage, DNA, genome sequencing, new discovery, NIH, Research, Search engine

Search:

News Directory 3

News Directory 3 catalogs US newspapers, news services, newsstands and digital news outlets across all 50 states. Browse local publishers by city, state, or topic, and follow current headlines linked back to their original sources.

Quick Links

  • Disclaimer
  • Terms and Conditions
  • About Us
  • Advertising Policy
  • Contact Us
  • Cookie Policy
  • Editorial Guidelines
  • Privacy Policy

Browse by State

  • Alabama
  • Alaska
  • Arizona
  • Arkansas
  • California
  • Colorado

© 2026 News Directory 3. All rights reserved.