Wednesday, May 27, 2009

Live Google IO notes

Effective GWT: Developing a complex, high-performance app with Google Web Toolkit

  • Effective GWT: Developing a complex, high-performance app with Google Web Toolkit
  • Lombardi Software - Blueprint
  • GWT What and Why
    • Generates optimized javascript (like escape analysis etc)
  • High Fidelity Mockup
    • Done in photoshop
    • More expensive
    • Finalize the icons and colors etc
  • Going to code
    • Be involved in the design
    • You need to know css and HTML DOM
    • What is the appropriate DOM structure
    • How to create and manipulate GWT
  • Design
    • Design outer layer with divs
    • Faster way is to do html panel and divs ? (what does that mean?)
    • DOM structure is created by GWT decorator panel
    • And, you can apply css on them
  • Handling Window Resizing
    • Goal is to handle browser window resizing
    • Static HTML you're limited to what you can achieve in css
    • Listen to ResizeEvent from window and propagate sizes down to children
    • Because they only do fixed sized row, they can do background images to do styling in table rows
  • Animation
    • Not all browsers do CSS3 ?
    • Helps users understand the behavior of application (provides visual feedback)
    • Done all in java in GWT
  • Original Implementation
    • Iterate through your objects, create widgets and add it to containers
    • Javascript - object creation and GC is expensive especially in IE6
  • New Implementation
    • Generate raw HTML in Javascript
    • Use flyweight pattern for event handling
    • They create html inside java (javascript) and do a DOM.setInnerHTML()
  • Event Handling
  • When All Else Fails
    • They dual compile code to Java and Javascript
    • If they find that the browser's javascript engine is slow, they render it on the server and sent to the client
    • So based on performance, they can dynamically move rendering between server and client
  • Compiling GWT code is slow
    • By default, GWT compiles code to 5 different browsers
    • You can tell GWT to compile code only for a single browser - locale, this speeds up development time
    • Well, you can run hosted mode and that never compiles :-) or use GWT 2.0 it never compiles and supports out of process hosted mode !
    • Instead of doing DOM manipulation over objects like Element.getStyle().setProperty('css property'), put that property in a css file
    • Checkout Episodes plugin from the creator of YSlow. It sends client performance numbers back to server.


This presentation is more of a war story. It deals with the Blueprint product. Because most of their clients run it in IE6, Lombadri had to go through extra steps to optimize their application to rely less on IE6's javascript engine. These techniques also apply when you have a really rich GWT application.

Transactions Across Datacenters (and Other Weekend Projects)
  • Consistency
    • Talked about Weak Consistency, Eventual Consistency (thanks to Werner for making this popular), Strong consistency (AppEngine datastore, File systems, RDBMSes, Azure tables)
  • Transactions
  • Why across datacenters?
    • Catastrophic failures, expected failures, routing maintenance, geo locality (CDN, edge caching etc)
    • Basically vertically partitioning your application
    • Packet roundtrip from west to east coast is 30ms
  • Why not across datacenters
    • Within a datacenter, it costs much lesser to communicate, low latency (1ms within rack, 1-5 ms across)
    • Outside datacenter
      • Expensive
      • High latency
  • Multihoming
    • ????
    • As soon as you write across multiple locations, you will have consistency problems
    • Realtime writes is always the hardest
    • Don't do it
    • A datacenter in silicon valley went down and twitter and friendfeed went down for more than 2 hrs. Both did not have multihoming
  • Option 2:
    • Better but not ideal
      • Have multiple datacenters, have primary and secondary
      • Mediocre at catastrophic failure
      • window of lost data because of asynchronous replication
    • Examples:
      • Amazon Web Services
      • Banks, Brokerages etc
    • Depending on systems, all your slaves can serve reads
  • Option 3: True Multihoming
    • Simultaneous writes in different data centers
    • Two way: hard
    • NASDAQ does 2 datacenters and does 2phase commit across them for transactions
    • Expensive and definitely slower
  • Techniques and Tradeoffs
    • Backups
      • Make a copy
      • Dog Fooding - they make other teams use their internal systems so they get to iterate and then release the API
    • Maser Slave replication
      • Usually asynchronous
        • Good for throughput, latency
      • Most RDBMSs do binary log based replication
      • AppEngine also follows this model.
      • AppEngine write is much slower than a relational db
      • But, it is geared for read more than write
    • Multi Master Replication
      • Support writes at multiple locations and then merge them
      • Asynchronous, eventual consistency (Amazon's shopping cart service does this)
      • You cannot rely on a global clock
      • Because of this, you cannot do global transactions
      • Another way of thinking about this is this is like mutlithreading without locks
    • Two Phase Commit
      • Heavyweight, synchronous, high latency
      • Semi distributed as there is a coordinator
    • Paxos
      • Fully distributed consensus protocol
      • No single master like 2PC
      • Still has longer latency
      • Gives a better throughput than 2PC
  • Paxos for the Datastore
    • Closer datacenter? not really because you are doing two round trips
    • Same datacenter? no
    • Opt In...
  • Paxos for AppEngine
    • They use that to coordinate when moving between datacenters
    • Use a lock server
    • Managing memcache
  • Conclusion
    • No silver bullet
    • Embracing tradeoffs
    • Consistency is app driven, the platform cannot make that choice.
    • AppEngine is going to support options in consistency models in future (Nice)



Building Scalable Complex apps on AppEngine:

  • List Property
    • Property has multiple values
    • Maintains it's order
    • Queried with an equals filter
    • Densely pack information instead of denormalizing it
    • Cut across all data and query on one of the values in the list property
    • select * from FavoriteColors where color = 'yellow' where color is a list property
    • Saves space to use list property
    • Uses more CPU to serialize and deserialize the list property
    • Never have composite index between two list properties because it creates a cartesian product index
  • Concrete Example: Microblogging
    • Fanout of messages can be inefficient in terms of space
    • Message sending by reference
    • You would use list properties instead of joins
    • select * from messages where receiver = 'user'
  • Problem with List Property
    • selects load all of the list properties
  • Relational Index Entity
    • Split the message into two entities (message index and message)
    • We put them into same entity group and make message index a child of the message
    • There is a key only query it lets you fetch just the fetch
    • Reads are 10 times faster and cheaper than with just plain list properties
  • Merge Join
    • AppEngine supports self joins
    • Data mining like operations
    • Don't have to build indexes in advance before this query
    • Can be used to test set membership
  • How does Merge Join work?
    • Because they don't have histograms (RDBMSs use histograms to make a query plan)
    • They store all property indexes in sorted order
    • Uses zigzag algorithm
    • If we are using 2 filters, it scans the first property to find a match, then moves the second one to find a match. Then, if the keys don't match, it moves the first one until both the property and key match
    • select * from animal where legs = 4 and type = 'cow'
    • Scales with number of filters
    • Can't apply sort orders - must sort in memory

Keynote 2

  • New Google Product Google Wave
    • Platform
    • Product
    • Protocol
  • A wave is a conversation between multiple people.
  • Wave can be viewed has an enhanced twitter or a hybrid (e-mail, IM, and word document)
    • Comment Support - Allows for inline comments or your typical comments at the end of the wave/posting.
    • Real-time updates from others - The document changes real-time while others are updating it.
    • Spell Check - A very sweet inline real-time spell checking and takes the word context into account. i.e., "Can I have some been soup" and it offered the follow "Can I have some bean soup"
    • Play back changes - You can play back the changes in the conversation to see what was done and in what order. Very much like playing back a video.
    • Private replies - Supports for private conversations between users hidden from others on the wave.
    • Drag and Drop - Wave supports D&D from iPhoto.
    • Uses Google Contacts - Integrates with the your GMail and GTalk contacts
    • Wave cloning - Wave allows for cloning of an existing Wave to create a new wave. Reminds me of Git :)
      • When this occurs, all subscribers or people in the wave are notified.
    • Inline editing - Supports inline editing from Wave and external websites that are using the Wave plug-ins.
      • When changes are made, the document is marked up to reflect where the changes were done, ONLY from the last time *you* viewed it.
    • Collaboration Editing - Awesome support for changing content in the same document in the same area and changes are reflected in different colors.
    • Open Social Integration
      • Any open social app can live inside wave
    • Developer API -
      • A demo was given of a custom widget that allows a user to vote "Yes, Maybe, No"
      • Sudoku Widget that allows multiple players to play with one another.
      • Chess Widget that allows others to play one another and uses the playback feature. Nice integration.
      • Google Maps Widget that shows all other users where you're looking. Also, draw regions, add pins and more! Very sweet!!
    • Search - You can search your contacts or use their built in Google search to search the web and actually use the results in the document. i.e., search for an image and select it to embed it into the document.
    • Multiple languages - Supports multiple languages.
      • Real-time language translation using a program called "Rosy". This was very sweet!
    • Polls - A nice little extension allows for creating polls.
    • System Federation between Wave 'systems'
      • Wave systems can collaborate between one another.
      • Private Waves between people within the same 'wave system' are never sent to other wave servers in the federation.
  • Forms are native to Wave
  • They are going to Open Source the 'lion' share of the code.
  • Written in GWT and HTML5
  • Demos
    • There was a demo of dragging and dropping a file into the browser to create an attachment. This is not supported in HTML5 yet, it is a prototype.
    • A nice demo of how the API is used to write an external application like a blog. Very cool demo.
    • Orkut demo with using their embedded API.
    • Nice twitter integration that signs in to twitter and actually will post tweets.
    • Very sweet code.google.com integration tool.
  • All attendees at Google I/O will obtain an account to use Google Wave before it's released. Hence, come see Google IO. Did I mention we also got a free Cell phone with a full month of unlimited service?
  • Comes with a developers API (We're talking about Google, it's expected! :)
  • Minor bug occurred when doing the demo. Hey, we're talking about live demos (turned out to be a wrong configured browser proxy).
  • Google Wave is great for team collaboration by adding inline comments, embed images, viewing changes, live changes, remove the conversation nose and release a final product and more.

  • Open Social Integration
    • Any open social app can live inside wave.
  • Website URLs
    • http://wave.google.com - Main website for Wave
    • http://code.google.com/apis/wave - API website
    • http://waveprotocol.org - The protocol website

Offline Processing on App Engine: A Look Ahead

By: Brett
Live Notes by @dushyanth

  • Motivation
    • AppEngine is great for request based database backed applications
    • Cron is good for periodic jobs, but not good enough
  • Problems with Polling
    • Wasted work as it is not event driven
    • Workers stay resident when there is no work wasting resources
    • Fixed number of workers. Or admins must manually add workers
    • Limited amount of optimization possible
      • Long lived hanging connectons
    • Existing task queue like systems
        • MQ, Amazon SQS, Azure Queues, Starling (getting popular these days)
    • Task Queue API
      • Part of AppEngine Labs (API may change until it graduates from Labs)
      • Asynchronous execution for a first in first out queue.
      • If execution fails, work will be retried until successful
      • Tasks are light weight to store. They are 3 times faster than storing in the datastore.
      • Tasks are scalable. The tasks can be started across a lot of machines.
      • Implements queueing. NOT pub-sub
      • Goals: High throughput, maximizing data throughput
      • Pushes tasks to the app. No polling
      • Uses Web hooks (It is a RESTful push-based interface for doing work)
      • Task is submitted as a web hook. If you get a 200 back, it succeeds.
      • Essentially combines queuing over REST.
      • Integrated into admin console as normal requests
      • Supports config driven throttling
        • Can be used to Prevent web services (external) from getting overloaded
        • Stay inside budget per hour etc
    • How task Queue Works
      • Tasks enqueue in a queue
      • Queue Moderator pulls from the head of the queue
      • It submits the task to the workers. Queue Moderator has capability to create new workers (threads).
      • Max number of threads depends on throughput
      • When a task is submitted, it could be running even before the enqueue request API call returns :-)
    • EdgeCases
      • Tasks have to be idempotent
      • Possible for a task to spuriously run twice even without failures.
      • You could use memcache or database to avoid it running twice, but that responsibility is on the developer
    • Working with TaskQueues
      • Each task added to a single queue
      • You can create multiple queues per application
      • Working with ETA (Estimated time of Arrival)
        • How long until the task is executed
        • Different than "visibility timeouts" in other systems
      • Working with tasks: Names
        • Tasks can be named. If a task is not named, it is auto generated
      • Prevents tasks from accidentally being submitted multiple times
    • Concrete Example: Write behind cache
      • Minimizes writes with repeated cache flushing
        • Write new data to cache
        • Periodically read cache and persist to disk
      • To implement, user submits data to cache and a task to task queue
      • When the task queue is processed, task is dispatched and the task does a periodic read from the cache and writes to the datastore. Essentially using the TaskQueue as an executor
    • Python only at first. Java comes next
    • Java support in the works
      • Webhooks, JMS
    • The Future
      • Batch Processing
        • Task Queue is good for small daasets (<100k>
        • More tools needed for parallelization
      • Map Reduce in future
        • Eventually
        • Want it to work with small and large (Terabyte scale) datasets


    The Softer Side of Schemas - Mapping Java Persistence Standards to the Google App Engine Datastore

    Live Notes by @dushyanth

    • Datastore is
      • Transactional
      • Natively Partitioned - developer does not have to worry about scaling
      • Hierarchical - every entity can have a notion of parent
      • Schemaless - no restricted structure
      • Based on BigTable
      • Not a relational database
      • Not a SQL Engine
    • Simplifying Storage
      • Simplifies
        • Development
        • Management of applications
      • Scale always matter
        • Request volume
        • Data volume
    • Datastore Storage Model
      • Entity consists of
        • Kind
        • Key
        • Entity Group
        • 0..n properties
        • If entity group == key, the entity is a parent
      • Heterogeneous property types. Properties can be of different types in different entities
      • Supports multi valued properties
      • Variable property - Having the same properties between entities is not needed
      • Soft Schema
        • It is a schema whose constraints are enforced only in the application layer
        • Simpler development process
          • Rapid typesafe prototype
        • Can be enforced by JDO or JPA metadata mappings
    • Transactions
      • Only transact within an entity
    • Relationship Management
      • JDO and JPA are not just about object relationships
        • Transparent persistence
        • Object view of your data
        • Centralized mapping
        • Big maintainability win
        • AppEngine decides and manages which entity group the entity belongs to
        • Uses ownership to enfore entity group colocation
    • Future JDO/JPA work
      • Support unowned relationships
    • Bringing existing code to App Engine
      • Datastore is not a drop in replacement for RDBMS
      • Plan for data migration
      • Primary Keys
        • Single property keys: Straight forward way to map single property keys
        • Composite keys:Can map to ancestor chain
        • Mapping table: Can be represented using multi-value properties
          • And can be queried with set memebership
      • Transactions
        • Identify roots in the data model
        • Identify operations that transact on multiple roots
        • Analyze impact of partial success
          • Refactor
          • Run compensating logic
      • Queries
        • Shift processing from reads to writes.
          • Denormalize
          • Expensive write and cheap reads


    Google Wave Client: Powered by GWT

    Live Notes by @dushyanth
    • Wave UI requirements
      • It got to be fast
      • Stunning
      • Optimistic UI
    • JSNI
      • Java can call javascript
    • Client Architecture
      • Bidirectional communication channel - keep alive http
      • Protocol Compiler
        • Generates interfaces, client + server implementations
    • GWT
      • Code Heavy
        • Can use UIBinder to plop GWT components into html
      • Most bugs are from CSS
        • Style Injector + CssResource
        • Looks like Minification + Image Spriting is done by GWT
        • Allows modularization of CSS
        • Different CSS for different browsers
      • Inefficient JSON handling
        • JSO - Javascript object structure
        • Didn't quite get it
      • Hosted mode isn't quite browser like
        • OOPHM (Out Of Process Hosted Mode) - Browser plugin to debug in eclipse
      • Download Size
        • runAsync(dynamic loading of code)
        • Download lazily
      • No transparency between javascript and java
        • SOYC (Story of your compile) reports
        • Java package to javascript breakdown report
      • JSOs cannot implement interfaces
        • SingleJsoImpl
        • In order to inline, JSOs cannot have polymorphic dispatch
        • Atmost one JSO class being implementing one interface
    • Improving Gears
      • Client side thumbnailing
        • They create a thumbnail using the workerpool before uploading the image to server.
      • Desktop drag n drop
      • Resumable uploading
    • Performance
      • Startup
        • runAsync
        • fast start
        • inline images + css
        • smaller download
        • stats collection
        • server-side script selection
          • Server sends down the correct javascript + css files based on http headers
      • Loaded Client
        • Optimistic UI (trying to guess what the user will click next)
        • Prefetching
        • Flyweight pattern
        • Rendering tricks
    • Mobile Client
      • Deferred binding saves the day
      • iPhone browser is always running
      • It loads faster than native apps
    • Testing
      • Use Model View Presenter design pattern - how is it different from MVC?
      • Prefer JUnit tests over GWTTestCase
      • Browser automation - WebDriver
      • Web driver is a developer focused tool for browser automation
      • Has native keyboard and mouse events, rather than synthesised via JS
      • iPhone Driver - automated testing on iPhone
      • Remote Web Driver - so web testing can be farmed out into a grid