Why Your LLM API Calls Slow Down: Physics, Distance, and Data Centres
Every call to a large language model API travels from your device through an ISP, across submarine fibre cables, and into a distant data centre before a GPU processes it and sends a response back. Data moves through fibre optic cable at roughly two-thirds the speed of light, meaning distance alone imposes hard physical limits on how fast a response can arrive. Developers in Nigeria, for example, face 100–200 milliseconds of geographic latency on top of inference time when hitting servers in Virginia or Frankfurt, because that is where the infrastructure is concentrated. The United States hosts over 5,500 data centres while Nigeria has just 17, a gap that directly shapes the experience of building and using AI tools from the African continent. The pre-response pause users often notice in streaming chatbots is therefore less about model speed and more about the unavoidable physics of global data travel.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.
Discussion (0)
Log in to join the discussion and vote.
Log in