Update README.md
Browse files
README.md
CHANGED
|
@@ -4,7 +4,7 @@ tags:
|
|
| 4 |
- DNA
|
| 5 |
- genomics
|
| 6 |
---
|
| 7 |
-
This is the official pre-trained model introduced in [
|
| 8 |
|
| 9 |
|
| 10 |
|
|
@@ -19,3 +19,16 @@ This is the official pre-trained model introduced in [GROVER : A foundation DNA
|
|
| 19 |
Some preliminary analysis shows that sequence re-tokenization using Byte Pair Encoding (BPE) changes significantly if the sequence is less than 50 nucleotides long. Longer than 50 nucleotides, you should still be careful with sequence edges.
|
| 20 |
We advice to add 100 nucleotides at the beginning and end of every sequence in order to garantee that your sequence is represented with the same tokens as the original tokenization.
|
| 21 |
We also provide the tokenized chromosomes with their respective nucleotide mappers (They are available in the folder tokenized chromosomes).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
- DNA
|
| 5 |
- genomics
|
| 6 |
---
|
| 7 |
+
This is the official pre-trained model introduced in [DNA language model GROVER learns sequence context in the human genome](https://www.nature.com/articles/s42256-024-00872-0)
|
| 8 |
|
| 9 |
|
| 10 |
|
|
|
|
| 19 |
Some preliminary analysis shows that sequence re-tokenization using Byte Pair Encoding (BPE) changes significantly if the sequence is less than 50 nucleotides long. Longer than 50 nucleotides, you should still be careful with sequence edges.
|
| 20 |
We advice to add 100 nucleotides at the beginning and end of every sequence in order to garantee that your sequence is represented with the same tokens as the original tokenization.
|
| 21 |
We also provide the tokenized chromosomes with their respective nucleotide mappers (They are available in the folder tokenized chromosomes).
|
| 22 |
+
|
| 23 |
+
### BibTeX entry and citation info
|
| 24 |
+
|
| 25 |
+
```bibtex
|
| 26 |
+
@article{sanabria2024dna,
|
| 27 |
+
title={DNA language model GROVER learns sequence context in the human genome},
|
| 28 |
+
author={Sanabria, Melissa and Hirsch, Jonas and Joubert, Pierre M and Poetsch, Anna R},
|
| 29 |
+
journal={Nature Machine Intelligence},
|
| 30 |
+
pages={1--13},
|
| 31 |
+
year={2024},
|
| 32 |
+
publisher={Nature Publishing Group UK London}
|
| 33 |
+
}
|
| 34 |
+
```
|