Wednesday, May 27, 2009

Live Google IO notes

Effective GWT: Developing a complex, high-performance app with Google Web Toolkit

  • Effective GWT: Developing a complex, high-performance app with Google Web Toolkit
  • Lombardi Software - Blueprint
  • GWT What and Why
    • Generates optimized javascript (like escape analysis etc)
  • High Fidelity Mockup
    • Done in photoshop
    • More expensive
    • Finalize the icons and colors etc
  • Going to code
    • Be involved in the design
    • You need to know css and HTML DOM
    • What is the appropriate DOM structure
    • How to create and manipulate GWT
  • Design
    • Design outer layer with divs
    • Faster way is to do html panel and divs ? (what does that mean?)
    • DOM structure is created by GWT decorator panel
    • And, you can apply css on them
  • Handling Window Resizing
    • Goal is to handle browser window resizing
    • Static HTML you're limited to what you can achieve in css
    • Listen to ResizeEvent from window and propagate sizes down to children
    • Because they only do fixed sized row, they can do background images to do styling in table rows
  • Animation
    • Not all browsers do CSS3 ?
    • Helps users understand the behavior of application (provides visual feedback)
    • Done all in java in GWT
  • Original Implementation
    • Iterate through your objects, create widgets and add it to containers
    • Javascript - object creation and GC is expensive especially in IE6
  • New Implementation
    • Generate raw HTML in Javascript
    • Use flyweight pattern for event handling
    • They create html inside java (javascript) and do a DOM.setInnerHTML()
  • Event Handling
  • When All Else Fails
    • They dual compile code to Java and Javascript
    • If they find that the browser's javascript engine is slow, they render it on the server and sent to the client
    • So based on performance, they can dynamically move rendering between server and client
  • Compiling GWT code is slow
    • By default, GWT compiles code to 5 different browsers
    • You can tell GWT to compile code only for a single browser - locale, this speeds up development time
    • Well, you can run hosted mode and that never compiles :-) or use GWT 2.0 it never compiles and supports out of process hosted mode !
    • Instead of doing DOM manipulation over objects like Element.getStyle().setProperty('css property'), put that property in a css file
    • Checkout Episodes plugin from the creator of YSlow. It sends client performance numbers back to server.


This presentation is more of a war story. It deals with the Blueprint product. Because most of their clients run it in IE6, Lombadri had to go through extra steps to optimize their application to rely less on IE6's javascript engine. These techniques also apply when you have a really rich GWT application.

Transactions Across Datacenters (and Other Weekend Projects)
  • Consistency
    • Talked about Weak Consistency, Eventual Consistency (thanks to Werner for making this popular), Strong consistency (AppEngine datastore, File systems, RDBMSes, Azure tables)
  • Transactions
  • Why across datacenters?
    • Catastrophic failures, expected failures, routing maintenance, geo locality (CDN, edge caching etc)
    • Basically vertically partitioning your application
    • Packet roundtrip from west to east coast is 30ms
  • Why not across datacenters
    • Within a datacenter, it costs much lesser to communicate, low latency (1ms within rack, 1-5 ms across)
    • Outside datacenter
      • Expensive
      • High latency
  • Multihoming
    • ????
    • As soon as you write across multiple locations, you will have consistency problems
    • Realtime writes is always the hardest
    • Don't do it
    • A datacenter in silicon valley went down and twitter and friendfeed went down for more than 2 hrs. Both did not have multihoming
  • Option 2:
    • Better but not ideal
      • Have multiple datacenters, have primary and secondary
      • Mediocre at catastrophic failure
      • window of lost data because of asynchronous replication
    • Examples:
      • Amazon Web Services
      • Banks, Brokerages etc
    • Depending on systems, all your slaves can serve reads
  • Option 3: True Multihoming
    • Simultaneous writes in different data centers
    • Two way: hard
    • NASDAQ does 2 datacenters and does 2phase commit across them for transactions
    • Expensive and definitely slower
  • Techniques and Tradeoffs
    • Backups
      • Make a copy
      • Dog Fooding - they make other teams use their internal systems so they get to iterate and then release the API
    • Maser Slave replication
      • Usually asynchronous
        • Good for throughput, latency
      • Most RDBMSs do binary log based replication
      • AppEngine also follows this model.
      • AppEngine write is much slower than a relational db
      • But, it is geared for read more than write
    • Multi Master Replication
      • Support writes at multiple locations and then merge them
      • Asynchronous, eventual consistency (Amazon's shopping cart service does this)
      • You cannot rely on a global clock
      • Because of this, you cannot do global transactions
      • Another way of thinking about this is this is like mutlithreading without locks
    • Two Phase Commit
      • Heavyweight, synchronous, high latency
      • Semi distributed as there is a coordinator
    • Paxos
      • Fully distributed consensus protocol
      • No single master like 2PC
      • Still has longer latency
      • Gives a better throughput than 2PC
  • Paxos for the Datastore
    • Closer datacenter? not really because you are doing two round trips
    • Same datacenter? no
    • Opt In...
  • Paxos for AppEngine
    • They use that to coordinate when moving between datacenters
    • Use a lock server
    • Managing memcache
  • Conclusion
    • No silver bullet
    • Embracing tradeoffs
    • Consistency is app driven, the platform cannot make that choice.
    • AppEngine is going to support options in consistency models in future (Nice)



Building Scalable Complex apps on AppEngine:

  • List Property
    • Property has multiple values
    • Maintains it's order
    • Queried with an equals filter
    • Densely pack information instead of denormalizing it
    • Cut across all data and query on one of the values in the list property
    • select * from FavoriteColors where color = 'yellow' where color is a list property
    • Saves space to use list property
    • Uses more CPU to serialize and deserialize the list property
    • Never have composite index between two list properties because it creates a cartesian product index
  • Concrete Example: Microblogging
    • Fanout of messages can be inefficient in terms of space
    • Message sending by reference
    • You would use list properties instead of joins
    • select * from messages where receiver = 'user'
  • Problem with List Property
    • selects load all of the list properties
  • Relational Index Entity
    • Split the message into two entities (message index and message)
    • We put them into same entity group and make message index a child of the message
    • There is a key only query it lets you fetch just the fetch
    • Reads are 10 times faster and cheaper than with just plain list properties
  • Merge Join
    • AppEngine supports self joins
    • Data mining like operations
    • Don't have to build indexes in advance before this query
    • Can be used to test set membership
  • How does Merge Join work?
    • Because they don't have histograms (RDBMSs use histograms to make a query plan)
    • They store all property indexes in sorted order
    • Uses zigzag algorithm
    • If we are using 2 filters, it scans the first property to find a match, then moves the second one to find a match. Then, if the keys don't match, it moves the first one until both the property and key match
    • select * from animal where legs = 4 and type = 'cow'
    • Scales with number of filters
    • Can't apply sort orders - must sort in memory

Keynote 2

  • New Google Product Google Wave
    • Platform
    • Product
    • Protocol
  • A wave is a conversation between multiple people.
  • Wave can be viewed has an enhanced twitter or a hybrid (e-mail, IM, and word document)
    • Comment Support - Allows for inline comments or your typical comments at the end of the wave/posting.
    • Real-time updates from others - The document changes real-time while others are updating it.
    • Spell Check - A very sweet inline real-time spell checking and takes the word context into account. i.e., "Can I have some been soup" and it offered the follow "Can I have some bean soup"
    • Play back changes - You can play back the changes in the conversation to see what was done and in what order. Very much like playing back a video.
    • Private replies - Supports for private conversations between users hidden from others on the wave.
    • Drag and Drop - Wave supports D&D from iPhoto.
    • Uses Google Contacts - Integrates with the your GMail and GTalk contacts
    • Wave cloning - Wave allows for cloning of an existing Wave to create a new wave. Reminds me of Git :)
      • When this occurs, all subscribers or people in the wave are notified.
    • Inline editing - Supports inline editing from Wave and external websites that are using the Wave plug-ins.
      • When changes are made, the document is marked up to reflect where the changes were done, ONLY from the last time *you* viewed it.
    • Collaboration Editing - Awesome support for changing content in the same document in the same area and changes are reflected in different colors.
    • Open Social Integration
      • Any open social app can live inside wave
    • Developer API -
      • A demo was given of a custom widget that allows a user to vote "Yes, Maybe, No"
      • Sudoku Widget that allows multiple players to play with one another.
      • Chess Widget that allows others to play one another and uses the playback feature. Nice integration.
      • Google Maps Widget that shows all other users where you're looking. Also, draw regions, add pins and more! Very sweet!!
    • Search - You can search your contacts or use their built in Google search to search the web and actually use the results in the document. i.e., search for an image and select it to embed it into the document.
    • Multiple languages - Supports multiple languages.
      • Real-time language translation using a program called "Rosy". This was very sweet!
    • Polls - A nice little extension allows for creating polls.
    • System Federation between Wave 'systems'
      • Wave systems can collaborate between one another.
      • Private Waves between people within the same 'wave system' are never sent to other wave servers in the federation.
  • Forms are native to Wave
  • They are going to Open Source the 'lion' share of the code.
  • Written in GWT and HTML5
  • Demos
    • There was a demo of dragging and dropping a file into the browser to create an attachment. This is not supported in HTML5 yet, it is a prototype.
    • A nice demo of how the API is used to write an external application like a blog. Very cool demo.
    • Orkut demo with using their embedded API.
    • Nice twitter integration that signs in to twitter and actually will post tweets.
    • Very sweet code.google.com integration tool.
  • All attendees at Google I/O will obtain an account to use Google Wave before it's released. Hence, come see Google IO. Did I mention we also got a free Cell phone with a full month of unlimited service?
  • Comes with a developers API (We're talking about Google, it's expected! :)
  • Minor bug occurred when doing the demo. Hey, we're talking about live demos (turned out to be a wrong configured browser proxy).
  • Google Wave is great for team collaboration by adding inline comments, embed images, viewing changes, live changes, remove the conversation nose and release a final product and more.

  • Open Social Integration
    • Any open social app can live inside wave.
  • Website URLs
    • http://wave.google.com - Main website for Wave
    • http://code.google.com/apis/wave - API website
    • http://waveprotocol.org - The protocol website

Offline Processing on App Engine: A Look Ahead

By: Brett
Live Notes by @dushyanth

  • Motivation
    • AppEngine is great for request based database backed applications
    • Cron is good for periodic jobs, but not good enough
  • Problems with Polling
    • Wasted work as it is not event driven
    • Workers stay resident when there is no work wasting resources
    • Fixed number of workers. Or admins must manually add workers
    • Limited amount of optimization possible
      • Long lived hanging connectons
    • Existing task queue like systems
        • MQ, Amazon SQS, Azure Queues, Starling (getting popular these days)
    • Task Queue API
      • Part of AppEngine Labs (API may change until it graduates from Labs)
      • Asynchronous execution for a first in first out queue.
      • If execution fails, work will be retried until successful
      • Tasks are light weight to store. They are 3 times faster than storing in the datastore.
      • Tasks are scalable. The tasks can be started across a lot of machines.
      • Implements queueing. NOT pub-sub
      • Goals: High throughput, maximizing data throughput
      • Pushes tasks to the app. No polling
      • Uses Web hooks (It is a RESTful push-based interface for doing work)
      • Task is submitted as a web hook. If you get a 200 back, it succeeds.
      • Essentially combines queuing over REST.
      • Integrated into admin console as normal requests
      • Supports config driven throttling
        • Can be used to Prevent web services (external) from getting overloaded
        • Stay inside budget per hour etc
    • How task Queue Works
      • Tasks enqueue in a queue
      • Queue Moderator pulls from the head of the queue
      • It submits the task to the workers. Queue Moderator has capability to create new workers (threads).
      • Max number of threads depends on throughput
      • When a task is submitted, it could be running even before the enqueue request API call returns :-)
    • EdgeCases
      • Tasks have to be idempotent
      • Possible for a task to spuriously run twice even without failures.
      • You could use memcache or database to avoid it running twice, but that responsibility is on the developer
    • Working with TaskQueues
      • Each task added to a single queue
      • You can create multiple queues per application
      • Working with ETA (Estimated time of Arrival)
        • How long until the task is executed
        • Different than "visibility timeouts" in other systems
      • Working with tasks: Names
        • Tasks can be named. If a task is not named, it is auto generated
      • Prevents tasks from accidentally being submitted multiple times
    • Concrete Example: Write behind cache
      • Minimizes writes with repeated cache flushing
        • Write new data to cache
        • Periodically read cache and persist to disk
      • To implement, user submits data to cache and a task to task queue
      • When the task queue is processed, task is dispatched and the task does a periodic read from the cache and writes to the datastore. Essentially using the TaskQueue as an executor
    • Python only at first. Java comes next
    • Java support in the works
      • Webhooks, JMS
    • The Future
      • Batch Processing
        • Task Queue is good for small daasets (<100k>
        • More tools needed for parallelization
      • Map Reduce in future
        • Eventually
        • Want it to work with small and large (Terabyte scale) datasets


    The Softer Side of Schemas - Mapping Java Persistence Standards to the Google App Engine Datastore

    Live Notes by @dushyanth

    • Datastore is
      • Transactional
      • Natively Partitioned - developer does not have to worry about scaling
      • Hierarchical - every entity can have a notion of parent
      • Schemaless - no restricted structure
      • Based on BigTable
      • Not a relational database
      • Not a SQL Engine
    • Simplifying Storage
      • Simplifies
        • Development
        • Management of applications
      • Scale always matter
        • Request volume
        • Data volume
    • Datastore Storage Model
      • Entity consists of
        • Kind
        • Key
        • Entity Group
        • 0..n properties
        • If entity group == key, the entity is a parent
      • Heterogeneous property types. Properties can be of different types in different entities
      • Supports multi valued properties
      • Variable property - Having the same properties between entities is not needed
      • Soft Schema
        • It is a schema whose constraints are enforced only in the application layer
        • Simpler development process
          • Rapid typesafe prototype
        • Can be enforced by JDO or JPA metadata mappings
    • Transactions
      • Only transact within an entity
    • Relationship Management
      • JDO and JPA are not just about object relationships
        • Transparent persistence
        • Object view of your data
        • Centralized mapping
        • Big maintainability win
        • AppEngine decides and manages which entity group the entity belongs to
        • Uses ownership to enfore entity group colocation
    • Future JDO/JPA work
      • Support unowned relationships
    • Bringing existing code to App Engine
      • Datastore is not a drop in replacement for RDBMS
      • Plan for data migration
      • Primary Keys
        • Single property keys: Straight forward way to map single property keys
        • Composite keys:Can map to ancestor chain
        • Mapping table: Can be represented using multi-value properties
          • And can be queried with set memebership
      • Transactions
        • Identify roots in the data model
        • Identify operations that transact on multiple roots
        • Analyze impact of partial success
          • Refactor
          • Run compensating logic
      • Queries
        • Shift processing from reads to writes.
          • Denormalize
          • Expensive write and cheap reads


    Google Wave Client: Powered by GWT

    Live Notes by @dushyanth
    • Wave UI requirements
      • It got to be fast
      • Stunning
      • Optimistic UI
    • JSNI
      • Java can call javascript
    • Client Architecture
      • Bidirectional communication channel - keep alive http
      • Protocol Compiler
        • Generates interfaces, client + server implementations
    • GWT
      • Code Heavy
        • Can use UIBinder to plop GWT components into html
      • Most bugs are from CSS
        • Style Injector + CssResource
        • Looks like Minification + Image Spriting is done by GWT
        • Allows modularization of CSS
        • Different CSS for different browsers
      • Inefficient JSON handling
        • JSO - Javascript object structure
        • Didn't quite get it
      • Hosted mode isn't quite browser like
        • OOPHM (Out Of Process Hosted Mode) - Browser plugin to debug in eclipse
      • Download Size
        • runAsync(dynamic loading of code)
        • Download lazily
      • No transparency between javascript and java
        • SOYC (Story of your compile) reports
        • Java package to javascript breakdown report
      • JSOs cannot implement interfaces
        • SingleJsoImpl
        • In order to inline, JSOs cannot have polymorphic dispatch
        • Atmost one JSO class being implementing one interface
    • Improving Gears
      • Client side thumbnailing
        • They create a thumbnail using the workerpool before uploading the image to server.
      • Desktop drag n drop
      • Resumable uploading
    • Performance
      • Startup
        • runAsync
        • fast start
        • inline images + css
        • smaller download
        • stats collection
        • server-side script selection
          • Server sends down the correct javascript + css files based on http headers
      • Loaded Client
        • Optimistic UI (trying to guess what the user will click next)
        • Prefetching
        • Flyweight pattern
        • Rendering tricks
    • Mobile Client
      • Deferred binding saves the day
      • iPhone browser is always running
      • It loads faster than native apps
    • Testing
      • Use Model View Presenter design pattern - how is it different from MVC?
      • Prefer JUnit tests over GWTTestCase
      • Browser automation - WebDriver
      • Web driver is a developer focused tool for browser automation
      • Has native keyboard and mouse events, rather than synthesised via JS
      • iPhone Driver - automated testing on iPhone
      • Remote Web Driver - so web testing can be farmed out into a grid

    Wednesday, April 8, 2009

    Java on the AppEngine

    Google has released support for Java on the AppEngine.

    Here are some areas that really interest me:
    Languages:
    Since the support is at a Java Platform level, other languages like JRuby, Scala, Clojure, Groovy will be able to run on the AppEngine.

    Enterprise Support:
    AppEngine has been around for about a year with support for python. Typical usecase of AppEngine was simple webapps (backed by Google's BigTable). There was no support for batch jobs and no support for bulk loading of data. Yesterday's announcement provides support for those two and includes support for a Secure Data Connector that lets you access your enterprise's data that is present behind a firewall.

    There were rumors about Java support on AppEngine for a few weeks now. Typically, AppEngine's features are aimed at a much higher level on the technology stack. If you look at python support for AppEngine, You will notice that it ships with a custom web framework that you can use to deploy applications. Java's support comes at a much lower level. It is much closer to the Servlet and JSP specification. These are not the exact tools you would choose to run your web apps in Java these days. Looks like Google has been counting on enabling developers run popular web frameworks on top of AppEngine. They have been working with people from the Java community enabling them to build support for AppEngine in their products.

    I have been playing with the java support in AppEngine, it integrates very nicely into Eclipse through a plugin, supports a one click deployment. Not that you will want to deploy to your production server from Eclipse ;-) But, developers can deploy to their dev versions of the application using it.

    Here are some quick stats on the environment that your programs will be running in:
    Environment:

    java.specification.version=1.6
    java.vendor=Sun Microsystems Inc.
    line.separator=\n
    java.class.version=50.0 (I use a Mac OS X, AppEngine supports both Java 5 and Java 6)
    java.util.logging.config.file=WEB-INF/logging.properties
    java.specification.name=Java Platform API Specification
    java.vendor.url=http\://java.sun.com/
    java.vm.version=1.6.0_13
    os.name=Linux java.version=1.6.0_13
    java.vm.specification.version=1.0
    user.dir=/base/data/home/apps//1.332645749880898171
    java.specification.vendor=Sun Microsystems Inc.
    java.vm.specification.name=Java Virtual Machine Specification
    java.vm.vendor=Sun Microsystems Inc.
    file.separator=/
    path.separator=\:
    java.vm.specification.vendor=Sun Microsystems Inc.
    java.vm.name=Java HotSpot(TM) Client VM
    file.encoding=ANSI_X3.4-1968

    Other Properties:
    availableProcessors: 1337 (humor ;-)
    totalMemory: 104857600 (bytes)
    maxMemory: 104857600
    freeMemory: 6293011

    I definitely intend to dig deeper into Java(and other JVM languages) on the AppEngine, and I'll be blogging my experiences as I go. You've been warned ;-)

    Monday, March 30, 2009

    My Joel Test

    I used to give client teams the Joel Test. It was good for the most part, but it does contain some pretty basic questions that almost all teams succeeded at. I retained a few from the Joel Test, and added some questions that I found were really essential to form a productive team.

    Retained these from the Joel Test:
    1. Can you make a build in one step?
    2. Do you fix bugs before writing new code?
    3. Do you use the best tools money can buy?
    4. Do new candidates write code during their interview?

    Added these to my list:
    5. Do you use Scrum/XP?
    6. Does your team do your own releases? (or is there a central release team that does that)
    7. Can your team start/stop your prod servers and batch jobs (ofcourse with proper audits)?
    8. Do you insist on measuring round trip times before and after a performance enhancement was made?
    9. Do you have realtime alerts comparing inputs and outputs of your system?
    10. In enterprise integration scenarios, do you automate the validation of inputs from other teams?
    11. Do developers get to interact frequently with the end users of the system?
    12. Does your team get some choice in the frameworks they use or does your company mandate a standard one?
    13. Do your developers participate/give tech talks? Do you sponsor if needed?

    A word on Joel's Tests:
    Joel's tests have 12 questions. I have retained four. The rest got the axe because I think they are too common (using a source control) or I still have see them in action (like hallway usability studies)

    5. Do you use Scrum/XP?
    Without going into the pros & cons of Scrum/XP, the only point I would like to add here is, both of them tend to keep the end user involved in product development. This cuts waste and gives direction.

    6. Does your team do your own releases? (or is there a central release team that does that?)
    In general, teams that can release and upgrade their applications tend to be more productive especially during release cycles. I've seen enough teams which were forced to go through a central release team. These poor teams often email/send the upgrade commands to the release team and twiddle their thumbs. When the release breaks (ofcourse that happens), the team is forced to 'remote debug' for the release team. The release team usually has no time/expertise to deal with these app specific issues. This brings the morale down, makes sure that upgrades are a major pain for everyone in the team. Hence, they tend to lump upgrades together and not to frequent releases at all.

    7. Can your developers start/stop your prod servers and batch jobs (ofcourse with proper audits)?
    Your team designed the app, architected it, and built it. They fix the bugs. If you trust them and give them responsibility of keeping the server up, they will/can handle it. If they need to bounce the server or force kick off a failed job, they have to be able to do it. If they are not able to do these tasks, during production outages etc, they execute these manually defeating the entire purpose. I do agree the need for proper audits, but I strongly believe that depends on the manager doing enough checks.

    8. Do you insist on measuring round trip times before and after a performance enhancement was made?
    Nothing speaks like numbers in performance situations. Performance is a non-functional feature. You can only see the effectiveness of the fix only if you measure the before and after effects.

    9. Do you have realtime alerts comparing inputs and outputs of your system?
    Nothing beats having checks that constantly compare the outputs with expected output for the real time input. When it goes out of sync, an alarm would be raised. This has the benefit of catching the error as soon as it occurs (failfast). This also has a benefit that you catch your error as soon as the user encountered it. This ensures a very quick turnaround time.

    10. In enterprise integration scenarios, do you automate the validation of inputs from other teams?
    Suppose you rely on a service from another team. Having automated input validations to the values returned by the service (sometimes proactively) ensures that you catch data setup errors in systems that you depend on as well.

    11. Do developers get to interact frequently with the end users of the system?
    Above all, this makes your developers take responsibility for their code & bugs. No one likes appearing sloppy. If developers can see that their product is being used by end users who frequently talk to them it makes them put their best code forward.

    12. Does your team get some choice in the frameworks they use or does your company mandate a standard one?
    Too many companies force their developers to use a single framework/API for a task. This is generally made by someone 'up there' who believes too much in consistency. While I agree that having too much choice is bad, having a bit of choice is generally good. Using the right tool for the job makes all the difference.

    13. Do your developers participate/give tech talks? Do you sponsor if needed?
    Tough times, I know. But, the truly committed developer usually involves himself in attending (or better giving) talks. And, most of these talks usually pay for the speaker's expenses which makes it a no brainer.