Where is HCA implemented?
When reading the paper, how HCA works together with CSA is not stated clearly. Figures 3, and 4 indicate they are for a similar mission.
In the code, I only find compress_ratio being used for kv compression, and it seems to be of CSA mentioned in the paper. So where is the for HCA then? I have read model.py and seems not find the answer. If it's in other files, please tell me where HCA is implemented.
Thanks a lot.
They are implemented within the same Attention class. HCA does not use the indexer.
self.compress_ratio = args.compress_ratios[layer_id]
if self.compress_ratio:
self.compressor = Compressor(args, self.compress_ratio, self.head_dim)
if self.compress_ratio == 4:
self.indexer = Indexer(args, self.compress_ratio)
else:
self.indexer = None
Thanks for your reply.
Since this piece of code is during initialization, either HCA or CSA, instead of both, is used for the model after deployment. So in which scenario they coexist and form a hybrid attention? Besides, the fixed ratio 4 is for simplicity or it is the best practice?
Looking forward to your reply. Appreciate your great work.