Nice presentation on Tumbler Architecture – Scaling Tumbler
Following are some tools mentioned in the presentation
Apache, PHP, Ruby, MySQL, Git
Scala + Finagle - Finagle is a network stack for the JVM that you can use to build asynchronous Remote Procedure Call (RPC) clients and servers in Java, Scala, or any JVM-hosted language. Finagle provides a rich set of protocol-independent tools.
Redis – Redis is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.
HBase – Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.
Hadoop – The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.
collectd – collectd is a daemon which collects system performance statistics periodically and provides mechanisms to store the values in a variety of ways, for example in RRD files.
OpenTSDB – OpenTSDB is a distributed, scalable Time Series Database (TSDB) written on top of HBase. OpenTSDB was written to address a common need: store, index and serve metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graph-able.
Gearman – Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events. In other words, it is the nervous system for how distributed processing communicates.
Kafka – Apache Kafka is a distributed publish-subscribe messaging system.
HA-Proxy – The Reliable, High Performance TCP/HTTP Load Balancer.
Varnish – Varnish is a web application accelerator.
nginx – nginx [engine x] is an HTTP and reverse proxy server, as well as a mail proxy server.
Scribe – Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures.