| This repository contains some of the matrices as described in | |
| * Alexander Yom Din, Taelin Karidi, Leshem Choshen, Mor Geva. 2023. Jump to Conclusions: Short-Cutting Transformers With Linear Transformations. ([arXiv:2303.09435](https://arxiv.org/abs/2303.09435)) | |
| please cite the paper as: | |
| ```bibtex | |
| @article{din2023jump, | |
| title={Jump to Conclusions: Short-Cutting Transformers With Linear Transformations}, | |
| author={Yom Din, Alexander and Karidi, Taelin and Choshen, Leshem and Geva, Mor}, | |
| journal={arXiv preprint arXiv:2303.09435}, | |
| year={2023}, | |
| } | |
| ``` | |
| For example, the file in `gpt2-medium/wikipedia/6_9.pickle` contains the matrix trained, on the wikipedia dataset, to transform 6th-layer hidden representations of tokens into 9th-layer hidden representations, for the Huggingface transformers `gpt2-medium` model. One loads and multiplies as follows: | |
| ``` | |
| import pickle | |
| import torch | |
| with open(file_name, 'rb') as f: | |
| mat = pickle.load(f) | |
| assert(isinstance(mat, torch.Tensor)) | |
| assert(len(mat.shape) == 2) | |
| assert(mat.shape[0] == mat.shape[1]) | |
| v = torch.rand(mat.shape[1]) | |
| w = mat @ v | |
| assert(w.shape == v.shape) | |
| ``` | |
| Some more information is in [https://github.com/sashayd/mat](https://github.com/sashayd/mat). |