Friday, March 28, 2008

TSSJS 2008 Day Three - Synopsis

Session II – eBay Market Place Architecture – Randy Shoup

What happened to Session I ‘s coverage? Lets just blame it on the night before :-)

We all know eBay doesn’t do transactions. Well at least no client side transactions. Their databases run on auto commit. One way they deal with it is by carefully ordering database transactions (example: inserting the slave record and then inserting the master to ensure a consistent master). They also have reconciliary jobs that go through the database and cleanse it periodically.

Strategies for scalability used by eBay
1. Partition
2. Asynchrony
3. Automate Everything
4: Remember Everything Fails

1. Partition

Obviously, they don’t use sessions and they don’t cache business objects (surprisingly). As expected, they use URL rewriting and cookies to track the user. If the data they have to keep about the user is larger than will fit in these two schemes, they use a scratch database. Since they don’t cache business / user related data, they do hammer the databases for all their queries. To handle this situation, they use a custom sharding solution over their ORM and partition their database based on functional divides in the application.

Search: Search queries come to an aggregator which Is actually a scatter-gather (from Enterprise Integration Patterns). This component forwards the search requests to individual nodes which are responsible for indexing and searching just a part of the entire data space. And, then return the results to the aggregator which aggregates (da !!!) and displays them.

2. Asynchrony

The really hard part of massaging systems is guaranteeing once only delivery. If you loosen this restriction, it is a lot easier to scale. They deal with duplicate events by modeling event processing to be idempotent. They deal with out of ordering by making the consumer go to a service that returns the latest state of the event once the consumer receives the event.

3. Automate Everything

Part of that is adaptive configuration: The consumers that dynamically adjusts to meet the SLA by changing parameters like event polling size , number of threads etc. The adaptive configuration also adapts to changes in number of consumer instances.

He gives an example of an adaptive search experience. They have a feedback loop and in an offline way, they analyze it , create a metadata out of it and feed it to the system that uses it to change it’s behavior. Perturbation is the idea that 90% of the time they recommend the optimal. 10% of the time they recommend new options B, C, D etc.. So that if D becomes popular, it will become the dominant recommended value. They also overweigh the negative feedback so that the oscillations are dampened. Pretty slick.

Strategy 4: Remember Everything Fails

Some of the failure patterns used are failure detection, rollback and graceful degradation. Applications log to a message bus and they have listeners that automate the failure detection. It also allows them to detect historical data and it is used from a capacity planning perspective. They get about 1.5TB of log messages every day :-) grep that.

Code rollout/rollback: They have a policy. NO changes to the site that cannot be undone. Each feature has a rollout plan. And, there is a monster rollout plan for the 2 weeks. There is an automated tool that rolls out the dependencies in the reverse dependencies. The automated tool also does rollbacks.

Here is a cool feature. Every feature has a on and off state. It allows them to turn features off rather than redeploying code that lacks that feature. This allows them to deploy features off and then start them later. They are decoupling code deployment from feature deployment. From a developer perspective, they check for feature availability. To blow my own horn a bit, I have built features in the past which can be turned on and off at runtime. I know what you’re thinking (don’t freak out, I don’t) . This is similar to OSGi. Nope not quite. OSGi is about deploying services and controlling their existence. I would say this may be similar from an implementation perspective, but the intent is quite different.

When the resource fails, and it is not critical, it is safely ignored. If a critical service fails, they go to an async mode (and do the processing later) or do failover. When a service does come back up, you don’t want all clients hitting it at once. They have a phased way of letting clients to hit it.

Overall, this talk was very informative. Randy went through a lot of concepts in great detail at a very high pace. I felt there was so much more information left that the talk could have gone for one more hour at the same pace.

Session III – The Busy Java Developer’s Guide to Scala – Ted Neward

A pure functional language has no side effects. But you knew that already. This talk focused on giving an Scala intro to a Java developer. Again, I am not going to cover much of this talk as you can read about Scala yourself.

Lunch Keynote Panel: Patrick Linskey, Ted Neward and others with Eugnee Ciurana as the mike boy ;-)

The conversation went towards the over abundance of frameworks in java. One good point made was, you should never sit to write a framework. You build an application and then extract a framework. In that case, YAGNI will be inherent in that effort. One of the very few web frameworks that was built like that is Rails.

Another point is, it has to be usable before it is reusable. Good one.

In answer to whether the appearance of free type(terse) languages (where syntax does not matter much) java will allow syntax to be optional, it was pointed out that if you try to shoe horn additional features into the language, it will fail under it’s own weight. The main thing in Java is the platform and the APIs and frameworks that are available. Additional languages that run on java platform but support a whole new set of features will see the light.

Session IV: Map Reduce – Why does it Matter – Eugene Ciurana

We looked at Map Reduce and worked out through implementation scenarios in various business domains and the problems or hurdles that we would face in arriving at the solutions using Map Reduce. Audience got to participate and overall, it was a good exercise. (There, I can talk in bullshit business lingo too :-) )

I'm sure you've seen this already, but it is too cool not to point out
http://members.aol.com/matt999h/bullshit.htm

No comments: