Tech Voice

Recently, I migrated one of my applications to Spring 3. The project was previously using Spring 2.0.8. Spring 2 was released in 2006 so it was high time that we migrated to a later version. That and the fact that there are plenty of reasons to migrate to Spring 3. Although Spring 3 was released in December 2009, the milestone and other pre-releases of Spring 3 have been around and being actively used by the Spring community for more than a year.

The application that we wanted to migrate was fairly large with 150k lines of code (tests not included). Since this was a heavily used application, we couldn't afford to have many regression errors. Luckily, we had about 6500 unit tests and about 1700 functional tests running constantly in a continuous integration system. Having so many tests execute every time we changed something basically ensured that we don't have too many regression errors. It also considerably boosts the confidence level when we want to make a change to the application.

The first thing you will notice when upgrading to Spring 3 from Spring 2 is that it is packaged in a bunch of jars (or maven dependencies). This changed in Spring 2.5 and is generally regarded a good move although it makes upgrading a bit of a pain (even with maven). The good part is, the jar/dependency names are pretty self descriptive. It'll take a couple of hit and miss tries to get all the dependencies into the classpath.

Now, lets move on to using Spring 3.
API Compatibility:
Spring 3 is generally API compatible with the older versions. The only major difference you might find is your compiler might warn about generics. Older classes like JdbcTemplate got a generics refit to their APIs.
For example: The return values of methods like queryForList got changed to give a more meaningful signature.
Before:
List<Map> userDetails = jdbcTemplate.queryForList

After:
List<Map<String, Object>> userDetails = jdbcTemplate.queryForList

Apart from these, there are also a few minor API changes where
Example: AbstractBeanFactory.getMergedBeanDefinition(..) returns BeanDefinition instead of RootBeanDefinition.
Most of these API changes are minor and should be easy to resolve.

LifeCycle:
If you are using the any of these beans and you had any of these declared as lazy-init, you may be in for a small surprise.
DefaultMessageListenerContainer
GenericMessageEndpointManager
JmsMessageEndpointManager
SchedulerFactoryBean
SimpleMessageListenerContainer

These components extend a new SmartLifecycle interface and are eligible for autostartup. Previous versions of Spring lacked a nice way to start services in a given order. The depends-on attribute doesn't really help if there are a lot of components to be started, especially if they are spread a bunch of xml files. The Lifecycle interface provided a crude way of achieving ordered bean startup albeit with some manual coding. The Phase, SmartLifecycle interfaces take it a step further and give an ability to automatically start your services when Spring starts up. More on this in a later blog post.

A SmartLifeCycle bean can declare that it needs to be autoStarted (by returning true in the isAutoStartup() method). When the ApplicationContext starts, it looks for any SmartLifeCycle beans that are present in the context, and initializes all of them, even if they are marked as lazy. It then selectively starts the ones that are marked to be autoStarted. This means that if any of these beans are previously declared to be lazy-init, they will be initialized after application context has initialized. This is not really a bug as the isAutoStartup() method is available to Spring only after the bean has been initialized. The consolation if any is that these beans are initialized in a later phase, during the startup of SmartLifecycle beans.

In our application, we were using a few of these beans. While testing, we were counting on these beans not getting initialized due to the lazy-init attribute. But, with Spring 3 these beans were all getting initialized, which was undesirable. The correct resolution was to move these beans out of the bean context into a separate context xml file that is never loaded during testing.

We've been running Spring 3 in production for a little over 2 weeks now, with no issues. Overall, the Spring 3 upgrade was pretty painless and straightforward. I think the Spring team did a good job of trying to preserve API compatibility and to make sure that there were no regression errors in the APIs.

If you are using an older version of Spring, it is time to move to Spring 3.

Ofcourse, you should follow me on twitter.

Effective GWT: Developing a complex, high-performance app with Google Web Toolkit

Effective GWT: Developing a complex, high-performance app with Google Web Toolkit
Lombardi Software - Blueprint
GWT What and Why

Generates optimized javascript (like escape analysis etc)

High Fidelity Mockup

Done in photoshop
More expensive
Finalize the icons and colors etc

Going to code

Be involved in the design
You need to know css and HTML DOM
What is the appropriate DOM structure
How to create and manipulate GWT

Design

Design outer layer with divs
Faster way is to do html panel and divs ? (what does that mean?)
DOM structure is created by GWT decorator panel
And, you can apply css on them

Handling Window Resizing

Goal is to handle browser window resizing
Static HTML you're limited to what you can achieve in css
Listen to ResizeEvent from window and propagate sizes down to children
Because they only do fixed sized row, they can do background images to do styling in table rows

Animation

Not all browsers do CSS3 ?
Helps users understand the behavior of application (provides visual feedback)
Done all in java in GWT

Original Implementation

Iterate through your objects, create widgets and add it to containers
Javascript - object creation and GC is expensive especially in IE6

New Implementation

Generate raw HTML in Javascript
Use flyweight pattern for event handling
They create html inside java (javascript) and do a DOM.setInnerHTML()

Event Handling
When All Else Fails

They dual compile code to Java and Javascript
If they find that the browser's javascript engine is slow, they render it on the server and sent to the client
So based on performance, they can dynamically move rendering between server and client

Compiling GWT code is slow

By default, GWT compiles code to 5 different browsers
You can tell GWT to compile code only for a single browser - locale, this speeds up development time
Well, you can run hosted mode and that never compiles :-) or use GWT 2.0 it never compiles and supports out of process hosted mode !
Instead of doing DOM manipulation over objects like Element.getStyle().setProperty('css property'), put that property in a css file
Checkout Episodes plugin from the creator of YSlow. It sends client performance numbers back to server.

This presentation is more of a war story. It deals with the Blueprint product. Because most of their clients run it in IE6, Lombadri had to go through extra steps to optimize their application to rely less on IE6's javascript engine. These techniques also apply when you have a really rich GWT application.

Transactions Across Datacenters (and Other Weekend Projects)

Consistency

Talked about Weak Consistency, Eventual Consistency (thanks to Werner for making this popular), Strong consistency (AppEngine datastore, File systems, RDBMSes, Azure tables)

Transactions
Why across datacenters?

Catastrophic failures, expected failures, routing maintenance, geo locality (CDN, edge caching etc)
Basically vertically partitioning your application
Packet roundtrip from west to east coast is 30ms

Why not across datacenters

Within a datacenter, it costs much lesser to communicate, low latency (1ms within rack, 1-5 ms across)
Outside datacenter

Expensive
High latency

Multihoming

????
As soon as you write across multiple locations, you will have consistency problems
Realtime writes is always the hardest
Don't do it
A datacenter in silicon valley went down and twitter and friendfeed went down for more than 2 hrs. Both did not have multihoming

Option 2:

Better but not ideal

Have multiple datacenters, have primary and secondary
Mediocre at catastrophic failure
window of lost data because of asynchronous replication

Examples:

Amazon Web Services
Banks, Brokerages etc

Depending on systems, all your slaves can serve reads

Option 3: True Multihoming

Simultaneous writes in different data centers
Two way: hard
NASDAQ does 2 datacenters and does 2phase commit across them for transactions
Expensive and definitely slower

Techniques and Tradeoffs

Backups

Make a copy
Dog Fooding - they make other teams use their internal systems so they get to iterate and then release the API

Maser Slave replication

Usually asynchronous

Good for throughput, latency

Most RDBMSs do binary log based replication
AppEngine also follows this model.
AppEngine write is much slower than a relational db
But, it is geared for read more than write

Multi Master Replication

Support writes at multiple locations and then merge them
Asynchronous, eventual consistency (Amazon's shopping cart service does this)
You cannot rely on a global clock
Because of this, you cannot do global transactions
Another way of thinking about this is this is like mutlithreading without locks

Two Phase Commit

Heavyweight, synchronous, high latency
Semi distributed as there is a coordinator

Paxos

Fully distributed consensus protocol
No single master like 2PC
Still has longer latency
Gives a better throughput than 2PC

Paxos for the Datastore

Closer datacenter? not really because you are doing two round trips
Same datacenter? no
Opt In...

Paxos for AppEngine

They use that to coordinate when moving between datacenters
Use a lock server
Managing memcache

Conclusion

No silver bullet
Embracing tradeoffs
Consistency is app driven, the platform cannot make that choice.
AppEngine is going to support options in consistency models in future (Nice)

Building Scalable Complex apps on AppEngine:

List Property

Property has multiple values
Maintains it's order
Queried with an equals filter
Densely pack information instead of denormalizing it
Cut across all data and query on one of the values in the list property
select * from FavoriteColors where color = 'yellow' where color is a list property
Saves space to use list property
Uses more CPU to serialize and deserialize the list property
Never have composite index between two list properties because it creates a cartesian product index

Concrete Example: Microblogging

Fanout of messages can be inefficient in terms of space
Message sending by reference
You would use list properties instead of joins
select * from messages where receiver = 'user'

Problem with List Property

selects load all of the list properties

Relational Index Entity

Split the message into two entities (message index and message)
We put them into same entity group and make message index a child of the message
There is a key only query it lets you fetch just the fetch
Reads are 10 times faster and cheaper than with just plain list properties

Merge Join

AppEngine supports self joins
Data mining like operations
Don't have to build indexes in advance before this query
Can be used to test set membership

How does Merge Join work?

Because they don't have histograms (RDBMSs use histograms to make a query plan)
They store all property indexes in sorted order
Uses zigzag algorithm
If we are using 2 filters, it scans the first property to find a match, then moves the second one to find a match. Then, if the keys don't match, it moves the first one until both the property and key match
select * from animal where legs = 4 and type = 'cow'
Scales with number of filters
Can't apply sort orders - must sort in memory

Keynote 2

New Google Product Google Wave

Platform
Product
Protocol

A wave is a conversation between multiple people.
Wave can be viewed has an enhanced twitter or a hybrid (e-mail, IM, and word document)

Comment Support - Allows for inline comments or your typical comments at the end of the wave/posting.
Real-time updates from others - The document changes real-time while others are updating it.
Spell Check - A very sweet inline real-time spell checking and takes the word context into account. i.e., "Can I have some been soup" and it offered the follow "Can I have some bean soup"
Play back changes - You can play back the changes in the conversation to see what was done and in what order. Very much like playing back a video.
Private replies - Supports for private conversations between users hidden from others on the wave.
Drag and Drop - Wave supports D&D from iPhoto.
Uses Google Contacts - Integrates with the your GMail and GTalk contacts
Wave cloning - Wave allows for cloning of an existing Wave to create a new wave. Reminds me of Git :)

When this occurs, all subscribers or people in the wave are notified.

Inline editing - Supports inline editing from Wave and external websites that are using the Wave plug-ins.

When changes are made, the document is marked up to reflect where the changes were done, ONLY from the last time *you* viewed it.

Collaboration Editing - Awesome support for changing content in the same document in the same area and changes are reflected in different colors.
Open Social Integration

Any open social app can live inside wave

Developer API -

A demo was given of a custom widget that allows a user to vote "Yes, Maybe, No"
Sudoku Widget that allows multiple players to play with one another.
Chess Widget that allows others to play one another and uses the playback feature. Nice integration.
Google Maps Widget that shows all other users where you're looking. Also, draw regions, add pins and more! Very sweet!!

Search - You can search your contacts or use their built in Google search to search the web and actually use the results in the document. i.e., search for an image and select it to embed it into the document.
Multiple languages - Supports multiple languages.

Real-time language translation using a program called "Rosy". This was very sweet!

Polls - A nice little extension allows for creating polls.
System Federation between Wave 'systems'

Wave systems can collaborate between one another.
Private Waves between people within the same 'wave system' are never sent to other wave servers in the federation.

Forms are native to Wave
They are going to Open Source the 'lion' share of the code.
Written in GWT and HTML5
Demos

There was a demo of dragging and dropping a file into the browser to create an attachment. This is not supported in HTML5 yet, it is a prototype.
A nice demo of how the API is used to write an external application like a blog. Very cool demo.
Orkut demo with using their embedded API.
Nice twitter integration that signs in to twitter and actually will post tweets.
Very sweet code.google.com integration tool.

All attendees at Google I/O will obtain an account to use Google Wave before it's released. Hence, come see Google IO. Did I mention we also got a free Cell phone with a full month of unlimited service?
Comes with a developers API (We're talking about Google, it's expected! :)
Minor bug occurred when doing the demo. Hey, we're talking about live demos (turned out to be a wrong configured browser proxy).
Google Wave is great for team collaboration by adding inline comments, embed images, viewing changes, live changes, remove the conversation nose and release a final product and more.
Open Social Integration

Any open social app can live inside wave.

Website URLs

http://wave.google.com - Main website for Wave
http://code.google.com/apis/wave - API website
http://waveprotocol.org - The protocol website

Offline Processing on App Engine: A Look Ahead

By: Brett
Live Notes by @dushyanth

Motivation

AppEngine is great for request based database backed applications
Cron is good for periodic jobs, but not good enough

Problems with Polling

Wasted work as it is not event driven
Workers stay resident when there is no work wasting resources
Fixed number of workers. Or admins must manually add workers
Limited amount of optimization possible

Long lived hanging connectons

Existing task queue like systems

MQ, Amazon SQS, Azure Queues, Starling (getting popular these days)

Task Queue API

Part of AppEngine Labs (API may change until it graduates from Labs)
Asynchronous execution for a first in first out queue.
If execution fails, work will be retried until successful
Tasks are light weight to store. They are 3 times faster than storing in the datastore.
Tasks are scalable. The tasks can be started across a lot of machines.
Implements queueing. NOT pub-sub
Goals: High throughput, maximizing data throughput
Pushes tasks to the app. No polling
Uses Web hooks (It is a RESTful push-based interface for doing work)
Task is submitted as a web hook. If you get a 200 back, it succeeds.
Essentially combines queuing over REST.
Integrated into admin console as normal requests
Supports config driven throttling

Can be used to Prevent web services (external) from getting overloaded
Stay inside budget per hour etc

How task Queue Works

Tasks enqueue in a queue
Queue Moderator pulls from the head of the queue
It submits the task to the workers. Queue Moderator has capability to create new workers (threads).
Max number of threads depends on throughput
When a task is submitted, it could be running even before the enqueue request API call returns :-)

EdgeCases

Tasks have to be idempotent
Possible for a task to spuriously run twice even without failures.
You could use memcache or database to avoid it running twice, but that responsibility is on the developer

Working with TaskQueues

Each task added to a single queue
You can create multiple queues per application

Working with ETA (Estimated time of Arrival)

How long until the task is executed
Different than "visibility timeouts" in other systems

Working with tasks: Names

Tasks can be named. If a task is not named, it is auto generated

Prevents tasks from accidentally being submitted multiple times

Concrete Example: Write behind cache

Minimizes writes with repeated cache flushing

Write new data to cache
Periodically read cache and persist to disk

To implement, user submits data to cache and a task to task queue
When the task queue is processed, task is dispatched and the task does a periodic read from the cache and writes to the datastore. Essentially using the TaskQueue as an executor

Python only at first. Java comes next
Java support in the works

Webhooks, JMS

The Future

Batch Processing

Task Queue is good for small daasets (<100k>
More tools needed for parallelization

Map Reduce in future

Eventually
Want it to work with small and large (Terabyte scale) datasets

The Softer Side of Schemas - Mapping Java Persistence Standards to the Google App Engine Datastore

Live Notes by @dushyanth

Datastore is

Transactional
Natively Partitioned - developer does not have to worry about scaling
Hierarchical - every entity can have a notion of parent
Schemaless - no restricted structure
Based on BigTable
Not a relational database
Not a SQL Engine

Simplifying Storage

Simplifies

Development
Management of applications

Scale always matter

Request volume
Data volume

Datastore Storage Model

Entity consists of

Kind
Key
Entity Group
0..n properties
If entity group == key, the entity is a parent

Heterogeneous property types. Properties can be of different types in different entities
Supports multi valued properties
Variable property - Having the same properties between entities is not needed
Soft Schema

It is a schema whose constraints are enforced only in the application layer
Simpler development process

Rapid typesafe prototype

Can be enforced by JDO or JPA metadata mappings

Transactions

Only transact within an entity

Relationship Management

JDO and JPA are not just about object relationships

Transparent persistence
Object view of your data
Centralized mapping
Big maintainability win
AppEngine decides and manages which entity group the entity belongs to
Uses ownership to enfore entity group colocation

Future JDO/JPA work

Support unowned relationships

Bringing existing code to App Engine

Datastore is not a drop in replacement for RDBMS
Plan for data migration
Primary Keys

Single property keys: Straight forward way to map single property keys
Composite keys:Can map to ancestor chain
Mapping table: Can be represented using multi-value properties

And can be queried with set memebership

Transactions

Identify roots in the data model
Identify operations that transact on multiple roots
Analyze impact of partial success

Refactor
Run compensating logic

Queries

Shift processing from reads to writes.

Denormalize
Expensive write and cheap reads

Google Wave Client: Powered by GWT

Live Notes by @dushyanth

Wave UI requirements

It got to be fast
Stunning
Optimistic UI

JSNI

Java can call javascript

Client Architecture

Bidirectional communication channel - keep alive http
Protocol Compiler

Generates interfaces, client + server implementations

Code Heavy

Can use UIBinder to plop GWT components into html

Most bugs are from CSS

Style Injector + CssResource
Looks like Minification + Image Spriting is done by GWT
Allows modularization of CSS
Different CSS for different browsers

Inefficient JSON handling

JSO - Javascript object structure
Didn't quite get it

Hosted mode isn't quite browser like

OOPHM (Out Of Process Hosted Mode) - Browser plugin to debug in eclipse

Download Size

runAsync(dynamic loading of code)
Download lazily

No transparency between javascript and java

SOYC (Story of your compile) reports
Java package to javascript breakdown report

JSOs cannot implement interfaces

SingleJsoImpl
In order to inline, JSOs cannot have polymorphic dispatch
Atmost one JSO class being implementing one interface

Improving Gears

Client side thumbnailing

They create a thumbnail using the workerpool before uploading the image to server.

Desktop drag n drop
Resumable uploading

Performance

Startup

runAsync
fast start
inline images + css
smaller download
stats collection
server-side script selection

Server sends down the correct javascript + css files based on http headers

Loaded Client

Optimistic UI (trying to guess what the user will click next)
Prefetching
Flyweight pattern
Rendering tricks

Mobile Client

Deferred binding saves the day
iPhone browser is always running
It loads faster than native apps

Testing

Use Model View Presenter design pattern - how is it different from MVC?
Prefer JUnit tests over GWTTestCase
Browser automation - WebDriver
Web driver is a developer focused tool for browser automation
Has native keyboard and mouse events, rather than synthesised via JS
iPhone Driver - automated testing on iPhone
Remote Web Driver - so web testing can be farmed out into a grid

Tech Voice

Wednesday, March 24, 2010

What to expect while Migrating to Spring 3

Friday, March 5, 2010

The first Date sucked. Can a second Date fix it? JSR 310

Wednesday, May 27, 2009

Live Google IO notes

Keynote 2

Offline Processing on App Engine: A Look Ahead

The Softer Side of Schemas - Mapping Java Persistence Standards to the Google App Engine Datastore

Google Wave Client: Powered by GWT

About Me

Currently

Blog Archive

Labels