Overview
The British Library is the national library of the United Kingdom and one of the world’s largest libraries. Its collections include more than 150 million items in more than 400 languages, including books, magazines, manuscripts, maps, music scores, newspapers, patents, databases, philatelic items, prints and drawings, and sound recordings. Since 2013 the Library has also collected non‑print materials through legal deposit, including ebooks, ejournals and UK domain websites.
Challenge
As one of the world’s largest and reputable research libraries, the British Library needs to service millions of requests and searches of their digital content. While servicing requests from legitimate users and ensuring their content is discoverable, they also wanted to ensure that the institution’s website does not get bogged down by web crawler bots, which are responsible for close to 10% of all traffic. To do so, they needed to throttle requests from web crawlers, spiders, and bots.
As a public institution, the Library also needs to follow information security guidelines and best practices mandated by the UK Government.
Last but not least, the Library’s Technology team wanted to expand the Library’s digital footprint by developing APIs that can be consumed by interested parties both internally and externally. As an example, other applications can retrieve archived digital content via an API. An efficient and high‑performance API gateway is crucial to serving the anticipated millions of API requests.
Solution
NGINX Plus’s rate‑limiting feature solved the Library’s problem with web crawler bots. As a Layer 7 load balancer, NGINX Plus can look inside HTTP headers and delay or drop requests from specific hosts. It can also impose restrictions on ranges of IP addresses, known as blacklisting, on a per‑URL basis.
According to John Gostick, Technical Services Manager for Discovery and Access at the British Library, “NGINX’s rate limiting capability was pivotal in throttling requests from web crawler bots”.
The Technology team first evaluated NGINX Open Source, but upgraded to NGINX Plus for a number of reasons. Library policy mandates use of enterprise‑grade software with professional support. NGINX and NGINX Plus are renowned for their ability to handle the massive traffic volumes experienced by large organizations, and at the Library NGINX Plus handles 11 million browser requests per day and up to 7,000 search requests per hour. NGINX Plus comes with enterprise‑grade, award‑winning support.
With NGINX Plus, the Library has been able to improve the user experience by implementing active health checks. NGINX Plus automatically detects upstream server failures and stops sending requests to failed servers, so users experience fewer timeouts and errors. Performance is maintained because NGINX Plus effectively distributes requests among the remaining upstream servers. To further ensure patrons have constant access to the Library’s web portal, the Technology team has implemented high availability (HA).
From an operational standpoint, NGINX Plus’ live activity monitoring provides the team with critical insights into the health and performance of their applications.
Results
Improved Reliability, Availability, and Visibility
High availability deployment of NGINX Plus has been incredibly stable, with failover between the primary and backup NGINX Plus instances causing no disruption to service. Configuration is synchronized across both the NGINX Plus servers, saving time and effort by eliminating the need to manage servers individually. With live activity monitoring, the Technology team can monitor the health of all upstream servers, including access status and errors.
Improved Security
With rate limiting, the team has throttled web crawler bot activity, thereby securing and reducing the load on upstream servers. NGINX Plus has been crucial to achieving and maintaining Cyber Essentials certification by helping the Library secure internet connections and providing access controls to data and services.
Future-Proofing Digital Strategy
The British Library’s product roadmap entails providing APIs that can be used by third parties. NGINX Plus provides a foundation to deliver reliable and high‑performing APIs. The Library doesn’t have to rely on another vendor to handle APIs – NGINX Plus offers robust functionality such as request routing, rate limiting, and API authentication. The NGINX Application Platform is the industry’s only solution that combines reverse proxy, cache, load balancer, API gateway, and WAF functions into a single, dynamic application gateway for north‑south app and east‑west API traffic. According to John Gostick, “Using the same solution to handle our north‑south and API traffic helps reduce costs, complexity and learning curve”.
About British Library
The British Library is the national library of the United Kingdom and one of the world’s greatest research libraries. It provides world‑class information services to the academic, business, research, and scientific communities and offers unparalleled access to the world’s largest and most comprehensive research collection. The Library’s collection has developed over 250 years and exceeds 150 million separate items representing every age of written civilization. It includes books, journals, manuscripts, maps, stamps, music, patents, photographs, newspapers, and sound recordings in all written and spoken languages. Up to 10 million people visit the British Library – www.bl.uk – every year where they can view up to 4 million digitized collection items and over 40 million pages.