Monday, April 27, 2009
A common problem with this approach are the incentives for home users, which the current paper addresses through the use of "credits" for computational power (which may be rewarded) and by allowing users to select computational topics they feel stronger about.
An interesting discussion point is the use of total energy of this approach compared to a dedicated cluster (a single organizational control). For example, BOINC launches multiple redundant tasks to avoid malicious results, which is clearly a waste of energy. Also, these tasks can wake up processors in low states of sleep (a global optimization could optimize this energy use) and transfer data on long distances.
One thing that I liked is the extensive use of randomization (returned peers are random, unchoking random peers) which has shown good results in practice.
Wednesday, April 15, 2009
Unfortunately, the paper is written from a cloud user standpoint. From a business perspective, the leader cloud providers have no incentives to open up their interfaces. The tradition of successful IT business seems more to indicate to rely on very closed systems e.g. look at Microsoft, Google (even though appears open it is actually very closed).
In fact, this is the approach for cloud that Google is pursuing with Google Apps. AppDrop could turn out to be a bad replica of the original due to many reasons and unknown tweaks/details of the Google setup.
Wednesday, April 1, 2009
While I was convinced by the usefulness of having such high level operations in the language constructions, I was not fully convinced by the language itself, which at a first glance does not look very appealing. I would argue (possibly wrongly :) ) that a language with this constructions but closer in syntax to an existing programming language would be much easily adopted.
Monday, March 30, 2009
I liked XTrace and I think it can be very useful to check where the distributed execution got stuck as it generates an easy to follow tree output.
The architecture is two tiered, with a core DTrace module (that acts as a multiplexer and disables interrupts) and providers, which are the modules that actually perform the tracing. In this way, different instrumentation methodologies can be added. Many different providers were already implemented by the authors.
Users specify arbitrary predicates to monitor their programs using a C-like programming language called "D".
This architecture is interesting. I think one recurrent tradeoff in the tracing systems is the amount of user work versus the overhead of tracing.
Wednesday, March 18, 2009
I liked that the paper presents a real problem that was detected using this tool. After reading the paper, I am not sure how much work is required to adapt Artemis to a new environment/application versus writing a quick-and-dirty application specific script to monitor interesting variables.
I think a tradeoff here is the log structuring to push more in the automated part of the analyzer and make analysis easier vs the ease of generating (unstructured) logs.
Related to this, the paper does not specify how the DryadLINQ computation that summarizes the logs works, and how does this always scale to use a commercial database for the analyzed data.
Wednesday, March 4, 2009
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
Dryad is used on wide scale at Microsoft and I think will be influential in 10 years because as an extension to map-reduce like jobs, it is the first paper to show how this can be done on a data center. However, due to the lack of an open-source implementation, the more complex dryad paradigm lags map-reduce in worldwide usage.
It seems obvious that this paper will be influential in 10 years.
A tradeoff that can be identified from reading subsequent papers such as Dryad is the one between the simplicity of the model and its expressivity (the latter translates into more efficiency and ease of expressing some complex computations).
Monday, March 2, 2009
The main advantage that I see is that, unlike SQL, having this language does not impose a schema on the information, is extensible to user defined functions and allows nested structures.
The paper also makes the case that is it easier for programmers to write in Pig versus the declarative SQL (as it more natural to write imperative code and it is well known that debugging declarative programs is difficult).
In general I am skeptical when being presented with yet another new language and at a first glance it seems that most examples can be written in SQL. However, after reading more, I actually liked Pig and I think there is need for a new such language and the choices made by Pig make sense to me. Since it is open source and not many such systems are readily available to outside communities, I would say Pig may have the traction to be influential in 10 years.
Wednesday, February 25, 2009
A key architectural component of PNUTS is a pub-sub system, used to ensure reliability and replication. PNUTS uses a proprietary message broker, and updates are considered committed when they have been published to the message broker.
Wednesday, February 18, 2009
The high level idea of Dynamo is to trade consistency for availability. The query model is a single key lookup and eventual consistency. In Dynamo, conflict resolution occurs at reads rather than writes (in contrast to traditional systems or Google's Bigtable). Dynamo's architecture can be characterized as a zero-hop DHT, where each node has pointers to all the other nodes (to reduce latency and its variability).
I found interesting to see how previous research works in DHTs (e.g virtual nodes) and distributed systems (e.g. vector clocks) are combined to form a real working system in industry.
Wednesday, February 11, 2009
The Bigtable interface is very simple, consisting of mapping a key tuple with three fileds
For implementation, a row is the unit of distribution and load balancing.
I really liked this paper, it presents a simple abstraction that is useful in many circumstances and easy to scale. I think it will probably be influential in 10 years because traditional databases are harder and harder to scale at modern data center sizes.
Chubby is adapted for requirements encountered by Google applications. Therefore, Chubby is optimized for availability and reliability rather than high performance, with locks expected to be course-grained (in time). Chubby is implemented as a set of servers that use a consensus protocol among them. Interestingly, Chubby does not mandate locks, meaning that the applications choose to not modify or corrupt the locked file. This is based mainly on two reasons: the lock service simpler to implement and it does not have to handle real malicious applications as it is running in a closed environment.
The paper makes a thorough discussion of why a lock service is useful compared to a distributed consensus algorithm such as Paxos which would require no new service. I mostly agree with the arguments presented in the paper, which are: some applications are not created to be highly available as required by Paxos (e.g. are started as prototypes); application writing is simplified with a lock service (rather than implement distributed consensus in all applications); a distributed consensus would require to advertise and store small amounts of data which is difficult to do and requires a lock through the name service anyway; a single reliable node can make progress without requiring a quorum.
As a critique, the arguments presented for why building a lock service rather than a consensus protocol are valid but not well structured. Another thing is that by not mandating a lock, poorly written applications could lead to hard to detect bugs.
I think this paper will be influential in 10 years since it is a new large step in the modern era of computer science, advancing the state of the art from the traditional file system models since assumptions have changed. Moreover, many companies (such as Microsoft) have implemented file systems that practically copy the GFS.
What I was less comfortable with is the centralized master, which although presents advantages towards simplicity and smarter replica placement, it feels in a way contradictory to the huge scaling issues that this file system is actually addressing.
Monday, February 9, 2009
Authors then outline the challenges of moving such an application to Cloud computing. Among the most important challenges are: the difficulty in migration between Cloud/non Cloud, the fact that accountability and SLAs are difficult/not clear.
I think this is a classical dilemma in computer science, to structure better for evolvability but loose efficiency VS to be monolithic and not evolvable but be more efficient (e.g. a similar example is micro-kernel vs monolithic kernel). I think in time a better structure always wins, although the time when the performance requirements can be met by such a structuring cannot be easily determined. The principles presented before for structuring an application, strongly advise to use a better structuring for scalability/evolvability.
Wednesday, February 4, 2009
The paper presents a switching layer enabling policies to be deployed for which middleboxes to be used and in which order, making configuration much easier.
I liked the paper and think it may be influential in 10 years because I think this type of enhancement would really improve the current state of on-path design.
The advantages of the proposed architecture are better fault tolerance and scaling with the number of nodes (although I'm not fully convinced by the second argument).
A clear disadvantage of this structure is that the bisection bandwidth is somewhat small, i.e. it is (O(2*log(hosts):1)). This impacts the deployment of random communication patterns and random placement of data in cloud computing environments.
For this reason and due to its complexity, I would tend to think this paper will not be influential in 10 years, at least not in deployments.
The problem seems real particularly with the cloud computing presumably requiring more bandwidth between data center nodes (at least due to higher utilization). The solution is interesting.
There are two main issues created by adopting such a solution: load balancing becomes much harder due to the large variety of smaller size paths, and wiring is more complex. Authors address both problems by proposing both a static load balancing using a two level routing algorithm and possibly a per (large) flow scheduling using a centralized scheduler. To improve wiring, authors propose a packaging solution.
I liked the paper and I think it will be influential in the next 10 year, not necessarily through deployments of this form but due to potentially triggering a paradigm shift in data center design.
Monday, February 2, 2009
The paper then describes replication topologies starting from one data center up to 5 data centers. The high level idea is to have one/two master servers per data center and keep them consistent by linking them all in a circular topology (a connected component with one redundant link). To return consistent data even in master failure (within the same data center), the design also uses hub replicas one per master in datacenter (that are read only). The paper further optimizes the design for read performance with an extra layer of client servers (below the hub layer, still read only as hubs).
Random thoughts: The design is interesting, as consistency is a property hard to achieve, but the presented design not motivated (at least not in this section) and alternative designs are not really discussed. It is interesting that in general, the more mechanism is thrown to cope with failures (e.g. more redundancy), the more complex the system design is and more errors can occur due to this complexity. One can see in the paper the reliability vs overhead (delay, communication) tradeoff. Finally, the authors mention that the links between masters are “separate” but I did not understood whether they are dedicated links or just the outgoing datacenter interface is separate (if dedicated, I would wonder about the failure probability of routing vs that of the dedicated links).
I found interesting the survival probability breakdown by age after a scan error and after a reallocation and the failure rate by age. I also liked the tracing/detection mechanism (using the Google stack) which is quite general.
I think these results may be interesting for quite some time since such detailed studies for large populations are hard to achieve and not often published, however, at least to me, a breakdown for model/manufacturer before drawing all of these charts would have made much more sense.
Crash: Data Center Horror Stories – this article describes a data center failure due to bad power cabling (but quite a hidden one). The most important lessons learned are: to not use a single point of failure and use specialized (data center specific) and highly reliable equipment.
Data Center Failure As A Learning Experience - this article emphasizes the role of planning and design in data center construction to “fail small”, since “fail never” may not be achievable. The article also notices that most failures are concentrated in shortly after being put in service and when approaching the end of life.