Skip to main content
News Directory 3
  • Home
  • Business
  • Entertainment
  • Health
  • News
  • Sports
  • Tech
  • World
Menu
  • Home
  • Business
  • Entertainment
  • Health
  • News
  • Sports
  • Tech
  • World

DNA Database Searchable: Largest Ever Created

September 17, 2025 Lisa Park - Tech Editor Tech

Here’s a ⁢breakdown of the provided ⁤text, focusing on⁣ Prashant Pandey’s research and the problem he’s addressing:

The⁤ Problem:

* Vast,​ Untapped⁣ Data: The​ Sequence ‌Read Archive ⁣(SRA) contains a ⁢massive amount‌ of raw DNA sequencing data (petabytes). This data holds valuable insights, but it’s currently⁣ arduous⁣ to search effectively.
* Searchability Gap: Assembled genomes (finished DNA ​sequences) are easily searchable, but the raw, fragmented “reads” within the SRA are not.
* Need ⁤for Retrospective Analysis: Scientists often want to know if a⁤ newly discovered genetic sequence (like a virus ⁤or​ bacterium)⁣ has appeared ‌in previous​ experiments stored in the SRA.⁣ This requires searching through the⁢ raw data.
*⁣ Transcript Search: The key is being ⁤able to search for longer genetic sequences (transcripts) within the ‍millions of⁣ short reads ⁤in the SRA.

Prashant Pandey’s Solution:

* Building⁢ a ‌Search Index: Pandey’s team is developing a system to create ⁣an index for the SRA data.
* K-grams ⁢& ‍Embeddings: They convert short reads ‌into small sequences called “K-grams”⁢ and then map these into a “high-dimensional embedding” -⁤ essentially creating a digital fingerprint for⁢ each experiment.
* Digital Fingerprinting: ⁤These fingerprints⁤ are stored in an index, ⁣allowing for faster and ⁤more efficient ⁢searches.
* Query⁢ Comparison: When a new ​genetic sequence (transcript) is entered, its fingerprint is generated and compared‍ to the index to quickly identify potential⁤ matches within the SRA data.

Key Quotes from Pandey:

* ​”We have this treasure ​trove, this‍ amazing and really insightful resource, which is​ just sitting around. We need the ability‌ to ⁢search the⁤ raw sequencing ‌data,⁣ all of‍ it, at the petabyte scale.”
* “This requires innovating at all the levels of the stack, starting from new approximate indexing‌ techniques, approximate data structures, building systems ‍that can ‌scale out in a distributed surroundings, hosting the whole thing ⁤in the cloud and ⁣making it ⁤publicly available for ⁤anyone to search.”

In essence,Pandey is working​ to unlock the ⁢potential of the SRA by making ⁣its vast amount of raw data ​searchable,enabling scientists to make new discoveries ‍and understand​ the genetic landscape more‌ comprehensively.

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X

Related

Archives, computer science, Data science, data storage, DNA, genome sequencing, new discovery, NIH, Research, Search engine

Search:

News Directory 3

ByoDirectory is a comprehensive directory of businesses and services across the United States. Find what you need, when you need it.

Quick Links

  • Copyright Notice
  • Disclaimer
  • Terms and Conditions

Browse by State

  • Alabama
  • Alaska
  • Arizona
  • Arkansas
  • California
  • Colorado

Connect With Us

© 2026 News Directory 3. All rights reserved.

Privacy Policy Terms of Service