SecondLayer Maps Cost and Design of 860B Legal AI Trained on 2TB Ukrainian Law
Ukrainian legal-tech firm SecondLayer has outlined a hypothetical project to train a 860-billion-parameter Mixture-of-Experts AI model on approximately 2 terabytes of Ukrainian and European legal data hosted on Google Cloud Platform. The corpus includes 96.2 million full-text Ukrainian court decisions, public registries, annotated legislation, Supreme Court rulings, and Spanish and EU legal texts. After deduplication and cleaning, the usable training corpus is estimated at 800–1,000 GB, yielding roughly 280–330 billion tokens — about 50 times smaller than DeepSeek V3's original 14.8 trillion-token dataset. The proposed architecture mirrors DeepSeek V3, with 671 billion total parameters but only 37 billion active per token, making high-volume inference more cost-efficient than dense models. The exercise is presented as a technical breakdown of dataset composition, model architecture, compute costs, and the capabilities such a domain-specific legal model could deliver.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.
Discussion (0)
Log in to join the discussion and vote.
Log in