Tech Voice: 2009

Effective GWT: Developing a complex, high-performance app with Google Web Toolkit

Effective GWT: Developing a complex, high-performance app with Google Web Toolkit
Lombardi Software - Blueprint
GWT What and Why

Generates optimized javascript (like escape analysis etc)

High Fidelity Mockup

Done in photoshop
More expensive
Finalize the icons and colors etc

Going to code

Be involved in the design
You need to know css and HTML DOM
What is the appropriate DOM structure
How to create and manipulate GWT

Design

Design outer layer with divs
Faster way is to do html panel and divs ? (what does that mean?)
DOM structure is created by GWT decorator panel
And, you can apply css on them

Handling Window Resizing

Goal is to handle browser window resizing
Static HTML you're limited to what you can achieve in css
Listen to ResizeEvent from window and propagate sizes down to children
Because they only do fixed sized row, they can do background images to do styling in table rows

Animation

Not all browsers do CSS3 ?
Helps users understand the behavior of application (provides visual feedback)
Done all in java in GWT

Original Implementation

Iterate through your objects, create widgets and add it to containers
Javascript - object creation and GC is expensive especially in IE6

New Implementation

Generate raw HTML in Javascript
Use flyweight pattern for event handling
They create html inside java (javascript) and do a DOM.setInnerHTML()

Event Handling
When All Else Fails

They dual compile code to Java and Javascript
If they find that the browser's javascript engine is slow, they render it on the server and sent to the client
So based on performance, they can dynamically move rendering between server and client

Compiling GWT code is slow

By default, GWT compiles code to 5 different browsers
You can tell GWT to compile code only for a single browser - locale, this speeds up development time
Well, you can run hosted mode and that never compiles :-) or use GWT 2.0 it never compiles and supports out of process hosted mode !
Instead of doing DOM manipulation over objects like Element.getStyle().setProperty('css property'), put that property in a css file
Checkout Episodes plugin from the creator of YSlow. It sends client performance numbers back to server.

This presentation is more of a war story. It deals with the Blueprint product. Because most of their clients run it in IE6, Lombadri had to go through extra steps to optimize their application to rely less on IE6's javascript engine. These techniques also apply when you have a really rich GWT application.

Transactions Across Datacenters (and Other Weekend Projects)

Consistency

Talked about Weak Consistency, Eventual Consistency (thanks to Werner for making this popular), Strong consistency (AppEngine datastore, File systems, RDBMSes, Azure tables)

Transactions
Why across datacenters?

Catastrophic failures, expected failures, routing maintenance, geo locality (CDN, edge caching etc)
Basically vertically partitioning your application
Packet roundtrip from west to east coast is 30ms

Why not across datacenters

Within a datacenter, it costs much lesser to communicate, low latency (1ms within rack, 1-5 ms across)
Outside datacenter

Expensive
High latency

Multihoming

????
As soon as you write across multiple locations, you will have consistency problems
Realtime writes is always the hardest
Don't do it
A datacenter in silicon valley went down and twitter and friendfeed went down for more than 2 hrs. Both did not have multihoming

Option 2:

Better but not ideal

Have multiple datacenters, have primary and secondary
Mediocre at catastrophic failure
window of lost data because of asynchronous replication

Examples:

Amazon Web Services
Banks, Brokerages etc

Depending on systems, all your slaves can serve reads

Option 3: True Multihoming

Simultaneous writes in different data centers
Two way: hard
NASDAQ does 2 datacenters and does 2phase commit across them for transactions
Expensive and definitely slower

Techniques and Tradeoffs

Backups

Make a copy
Dog Fooding - they make other teams use their internal systems so they get to iterate and then release the API

Maser Slave replication

Usually asynchronous

Good for throughput, latency

Most RDBMSs do binary log based replication
AppEngine also follows this model.
AppEngine write is much slower than a relational db
But, it is geared for read more than write

Multi Master Replication

Support writes at multiple locations and then merge them
Asynchronous, eventual consistency (Amazon's shopping cart service does this)
You cannot rely on a global clock
Because of this, you cannot do global transactions
Another way of thinking about this is this is like mutlithreading without locks

Two Phase Commit

Heavyweight, synchronous, high latency
Semi distributed as there is a coordinator

Paxos

Fully distributed consensus protocol
No single master like 2PC
Still has longer latency
Gives a better throughput than 2PC

Paxos for the Datastore

Closer datacenter? not really because you are doing two round trips
Same datacenter? no
Opt In...

Paxos for AppEngine

They use that to coordinate when moving between datacenters
Use a lock server
Managing memcache

Conclusion

No silver bullet
Embracing tradeoffs
Consistency is app driven, the platform cannot make that choice.
AppEngine is going to support options in consistency models in future (Nice)

Building Scalable Complex apps on AppEngine:

List Property

Property has multiple values
Maintains it's order
Queried with an equals filter
Densely pack information instead of denormalizing it
Cut across all data and query on one of the values in the list property
select * from FavoriteColors where color = 'yellow' where color is a list property
Saves space to use list property
Uses more CPU to serialize and deserialize the list property
Never have composite index between two list properties because it creates a cartesian product index

Concrete Example: Microblogging

Fanout of messages can be inefficient in terms of space
Message sending by reference
You would use list properties instead of joins
select * from messages where receiver = 'user'

Problem with List Property

selects load all of the list properties

Relational Index Entity

Split the message into two entities (message index and message)
We put them into same entity group and make message index a child of the message
There is a key only query it lets you fetch just the fetch
Reads are 10 times faster and cheaper than with just plain list properties

Merge Join

AppEngine supports self joins
Data mining like operations
Don't have to build indexes in advance before this query
Can be used to test set membership

How does Merge Join work?

Because they don't have histograms (RDBMSs use histograms to make a query plan)
They store all property indexes in sorted order
Uses zigzag algorithm
If we are using 2 filters, it scans the first property to find a match, then moves the second one to find a match. Then, if the keys don't match, it moves the first one until both the property and key match
select * from animal where legs = 4 and type = 'cow'
Scales with number of filters
Can't apply sort orders - must sort in memory

Keynote 2

New Google Product Google Wave

Platform
Product
Protocol

A wave is a conversation between multiple people.
Wave can be viewed has an enhanced twitter or a hybrid (e-mail, IM, and word document)

Comment Support - Allows for inline comments or your typical comments at the end of the wave/posting.
Real-time updates from others - The document changes real-time while others are updating it.
Spell Check - A very sweet inline real-time spell checking and takes the word context into account. i.e., "Can I have some been soup" and it offered the follow "Can I have some bean soup"
Play back changes - You can play back the changes in the conversation to see what was done and in what order. Very much like playing back a video.
Private replies - Supports for private conversations between users hidden from others on the wave.
Drag and Drop - Wave supports D&D from iPhoto.
Uses Google Contacts - Integrates with the your GMail and GTalk contacts
Wave cloning - Wave allows for cloning of an existing Wave to create a new wave. Reminds me of Git :)

When this occurs, all subscribers or people in the wave are notified.

Inline editing - Supports inline editing from Wave and external websites that are using the Wave plug-ins.

When changes are made, the document is marked up to reflect where the changes were done, ONLY from the last time *you* viewed it.

Collaboration Editing - Awesome support for changing content in the same document in the same area and changes are reflected in different colors.
Open Social Integration

Any open social app can live inside wave

Developer API -

A demo was given of a custom widget that allows a user to vote "Yes, Maybe, No"
Sudoku Widget that allows multiple players to play with one another.
Chess Widget that allows others to play one another and uses the playback feature. Nice integration.
Google Maps Widget that shows all other users where you're looking. Also, draw regions, add pins and more! Very sweet!!

Search - You can search your contacts or use their built in Google search to search the web and actually use the results in the document. i.e., search for an image and select it to embed it into the document.
Multiple languages - Supports multiple languages.

Real-time language translation using a program called "Rosy". This was very sweet!

Polls - A nice little extension allows for creating polls.
System Federation between Wave 'systems'

Wave systems can collaborate between one another.
Private Waves between people within the same 'wave system' are never sent to other wave servers in the federation.

Forms are native to Wave
They are going to Open Source the 'lion' share of the code.
Written in GWT and HTML5
Demos

There was a demo of dragging and dropping a file into the browser to create an attachment. This is not supported in HTML5 yet, it is a prototype.
A nice demo of how the API is used to write an external application like a blog. Very cool demo.
Orkut demo with using their embedded API.
Nice twitter integration that signs in to twitter and actually will post tweets.
Very sweet code.google.com integration tool.

All attendees at Google I/O will obtain an account to use Google Wave before it's released. Hence, come see Google IO. Did I mention we also got a free Cell phone with a full month of unlimited service?
Comes with a developers API (We're talking about Google, it's expected! :)
Minor bug occurred when doing the demo. Hey, we're talking about live demos (turned out to be a wrong configured browser proxy).
Google Wave is great for team collaboration by adding inline comments, embed images, viewing changes, live changes, remove the conversation nose and release a final product and more.
Open Social Integration

Any open social app can live inside wave.

Website URLs

http://wave.google.com - Main website for Wave
http://code.google.com/apis/wave - API website
http://waveprotocol.org - The protocol website

Offline Processing on App Engine: A Look Ahead

By: Brett
Live Notes by @dushyanth

Motivation

AppEngine is great for request based database backed applications
Cron is good for periodic jobs, but not good enough

Problems with Polling

Wasted work as it is not event driven
Workers stay resident when there is no work wasting resources
Fixed number of workers. Or admins must manually add workers
Limited amount of optimization possible

Long lived hanging connectons

Existing task queue like systems

MQ, Amazon SQS, Azure Queues, Starling (getting popular these days)

Task Queue API

Part of AppEngine Labs (API may change until it graduates from Labs)
Asynchronous execution for a first in first out queue.
If execution fails, work will be retried until successful
Tasks are light weight to store. They are 3 times faster than storing in the datastore.
Tasks are scalable. The tasks can be started across a lot of machines.
Implements queueing. NOT pub-sub
Goals: High throughput, maximizing data throughput
Pushes tasks to the app. No polling
Uses Web hooks (It is a RESTful push-based interface for doing work)
Task is submitted as a web hook. If you get a 200 back, it succeeds.
Essentially combines queuing over REST.
Integrated into admin console as normal requests
Supports config driven throttling

Can be used to Prevent web services (external) from getting overloaded
Stay inside budget per hour etc

How task Queue Works

Tasks enqueue in a queue
Queue Moderator pulls from the head of the queue
It submits the task to the workers. Queue Moderator has capability to create new workers (threads).
Max number of threads depends on throughput
When a task is submitted, it could be running even before the enqueue request API call returns :-)

EdgeCases

Tasks have to be idempotent
Possible for a task to spuriously run twice even without failures.
You could use memcache or database to avoid it running twice, but that responsibility is on the developer

Working with TaskQueues

Each task added to a single queue
You can create multiple queues per application

Working with ETA (Estimated time of Arrival)

How long until the task is executed
Different than "visibility timeouts" in other systems

Working with tasks: Names

Tasks can be named. If a task is not named, it is auto generated

Prevents tasks from accidentally being submitted multiple times

Concrete Example: Write behind cache

Minimizes writes with repeated cache flushing

Write new data to cache
Periodically read cache and persist to disk

To implement, user submits data to cache and a task to task queue
When the task queue is processed, task is dispatched and the task does a periodic read from the cache and writes to the datastore. Essentially using the TaskQueue as an executor

Python only at first. Java comes next
Java support in the works

Webhooks, JMS

The Future

Batch Processing

Task Queue is good for small daasets (<100k>
More tools needed for parallelization

Map Reduce in future

Eventually
Want it to work with small and large (Terabyte scale) datasets

The Softer Side of Schemas - Mapping Java Persistence Standards to the Google App Engine Datastore

Live Notes by @dushyanth

Datastore is

Transactional
Natively Partitioned - developer does not have to worry about scaling
Hierarchical - every entity can have a notion of parent
Schemaless - no restricted structure
Based on BigTable
Not a relational database
Not a SQL Engine

Simplifying Storage

Simplifies

Development
Management of applications

Scale always matter

Request volume
Data volume

Datastore Storage Model

Entity consists of

Kind
Key
Entity Group
0..n properties
If entity group == key, the entity is a parent

Heterogeneous property types. Properties can be of different types in different entities
Supports multi valued properties
Variable property - Having the same properties between entities is not needed
Soft Schema

It is a schema whose constraints are enforced only in the application layer
Simpler development process

Rapid typesafe prototype

Can be enforced by JDO or JPA metadata mappings

Transactions

Only transact within an entity

Relationship Management

JDO and JPA are not just about object relationships

Transparent persistence
Object view of your data
Centralized mapping
Big maintainability win
AppEngine decides and manages which entity group the entity belongs to
Uses ownership to enfore entity group colocation

Future JDO/JPA work

Support unowned relationships

Bringing existing code to App Engine

Datastore is not a drop in replacement for RDBMS
Plan for data migration
Primary Keys

Single property keys: Straight forward way to map single property keys
Composite keys:Can map to ancestor chain
Mapping table: Can be represented using multi-value properties

And can be queried with set memebership

Transactions

Identify roots in the data model
Identify operations that transact on multiple roots
Analyze impact of partial success

Refactor
Run compensating logic

Queries

Shift processing from reads to writes.

Denormalize
Expensive write and cheap reads

Google Wave Client: Powered by GWT

Live Notes by @dushyanth

Wave UI requirements

It got to be fast
Stunning
Optimistic UI

JSNI

Java can call javascript

Client Architecture

Bidirectional communication channel - keep alive http
Protocol Compiler

Generates interfaces, client + server implementations

Code Heavy

Can use UIBinder to plop GWT components into html

Most bugs are from CSS

Style Injector + CssResource
Looks like Minification + Image Spriting is done by GWT
Allows modularization of CSS
Different CSS for different browsers

Inefficient JSON handling

JSO - Javascript object structure
Didn't quite get it

Hosted mode isn't quite browser like

OOPHM (Out Of Process Hosted Mode) - Browser plugin to debug in eclipse

Download Size

runAsync(dynamic loading of code)
Download lazily

No transparency between javascript and java

SOYC (Story of your compile) reports
Java package to javascript breakdown report

JSOs cannot implement interfaces

SingleJsoImpl
In order to inline, JSOs cannot have polymorphic dispatch
Atmost one JSO class being implementing one interface

Improving Gears

Client side thumbnailing

They create a thumbnail using the workerpool before uploading the image to server.

Desktop drag n drop
Resumable uploading

Performance

Startup

runAsync
fast start
inline images + css
smaller download
stats collection
server-side script selection

Server sends down the correct javascript + css files based on http headers

Loaded Client

Optimistic UI (trying to guess what the user will click next)
Prefetching
Flyweight pattern
Rendering tricks

Mobile Client

Deferred binding saves the day
iPhone browser is always running
It loads faster than native apps

Testing

Use Model View Presenter design pattern - how is it different from MVC?
Prefer JUnit tests over GWTTestCase
Browser automation - WebDriver
Web driver is a developer focused tool for browser automation
Has native keyboard and mouse events, rather than synthesised via JS
iPhone Driver - automated testing on iPhone
Remote Web Driver - so web testing can be farmed out into a grid

I used to give client teams the Joel Test. It was good for the most part, but it does contain some pretty basic questions that almost all teams succeeded at. I retained a few from the Joel Test, and added some questions that I found were really essential to form a productive team.

Retained these from the Joel Test:
1. Can you make a build in one step?
2. Do you fix bugs before writing new code?
3. Do you use the best tools money can buy?
4. Do new candidates write code during their interview?

Added these to my list:
5. Do you use Scrum/XP?
6. Does your team do your own releases? (or is there a central release team that does that)
7. Can your team start/stop your prod servers and batch jobs (ofcourse with proper audits)?
8. Do you insist on measuring round trip times before and after a performance enhancement was made?
9. Do you have realtime alerts comparing inputs and outputs of your system?
10. In enterprise integration scenarios, do you automate the validation of inputs from other teams?
11. Do developers get to interact frequently with the end users of the system?
12. Does your team get some choice in the frameworks they use or does your company mandate a standard one?
13. Do your developers participate/give tech talks? Do you sponsor if needed?

A word on Joel's Tests:
Joel's tests have 12 questions. I have retained four. The rest got the axe because I think they are too common (using a source control) or I still have see them in action (like hallway usability studies)

5. Do you use Scrum/XP?
Without going into the pros & cons of Scrum/XP, the only point I would like to add here is, both of them tend to keep the end user involved in product development. This cuts waste and gives direction.

6. Does your team do your own releases? (or is there a central release team that does that?)
In general, teams that can release and upgrade their applications tend to be more productive especially during release cycles. I've seen enough teams which were forced to go through a central release team. These poor teams often email/send the upgrade commands to the release team and twiddle their thumbs. When the release breaks (ofcourse that happens), the team is forced to 'remote debug' for the release team. The release team usually has no time/expertise to deal with these app specific issues. This brings the morale down, makes sure that upgrades are a major pain for everyone in the team. Hence, they tend to lump upgrades together and not to frequent releases at all.

7. Can your developers start/stop your prod servers and batch jobs (ofcourse with proper audits)?
Your team designed the app, architected it, and built it. They fix the bugs. If you trust them and give them responsibility of keeping the server up, they will/can handle it. If they need to bounce the server or force kick off a failed job, they have to be able to do it. If they are not able to do these tasks, during production outages etc, they execute these manually defeating the entire purpose. I do agree the need for proper audits, but I strongly believe that depends on the manager doing enough checks.

8. Do you insist on measuring round trip times before and after a performance enhancement was made?
Nothing speaks like numbers in performance situations. Performance is a non-functional feature. You can only see the effectiveness of the fix only if you measure the before and after effects.

9. Do you have realtime alerts comparing inputs and outputs of your system?
Nothing beats having checks that constantly compare the outputs with expected output for the real time input. When it goes out of sync, an alarm would be raised. This has the benefit of catching the error as soon as it occurs (failfast). This also has a benefit that you catch your error as soon as the user encountered it. This ensures a very quick turnaround time.

10. In enterprise integration scenarios, do you automate the validation of inputs from other teams?
Suppose you rely on a service from another team. Having automated input validations to the values returned by the service (sometimes proactively) ensures that you catch data setup errors in systems that you depend on as well.

11. Do developers get to interact frequently with the end users of the system?
Above all, this makes your developers take responsibility for their code & bugs. No one likes appearing sloppy. If developers can see that their product is being used by end users who frequently talk to them it makes them put their best code forward.

12. Does your team get some choice in the frameworks they use or does your company mandate a standard one?
Too many companies force their developers to use a single framework/API for a task. This is generally made by someone 'up there' who believes too much in consistency. While I agree that having too much choice is bad, having a bit of choice is generally good. Using the right tool for the job makes all the difference.

13. Do your developers participate/give tech talks? Do you sponsor if needed?
Tough times, I know. But, the truly committed developer usually involves himself in attending (or better giving) talks. And, most of these talks usually pay for the speaker's expenses which makes it a no brainer.

Tech Voice

Wednesday, May 27, 2009

Live Google IO notes

Keynote 2

Offline Processing on App Engine: A Look Ahead

The Softer Side of Schemas - Mapping Java Persistence Standards to the Google App Engine Datastore

Google Wave Client: Powered by GWT

Wednesday, April 8, 2009

Java on the AppEngine

Monday, March 30, 2009

My Joel Test

About Me

Currently

Blog Archive

Labels