Wednesday, March 24, 2010

What to expect while Migrating to Spring 3

Recently, I migrated one of my applications to Spring 3. The project was previously using Spring 2.0.8. Spring 2 was released in 2006 so it was high time that we migrated to a later version. That and the fact that there are plenty of reasons to migrate to Spring 3. Although Spring 3 was released in December 2009, the milestone and other pre-releases of Spring 3 have been around and being actively used by the Spring community for more than a year.

The application that we wanted to migrate was fairly large with 150k lines of code (tests not included). Since this was a heavily used application, we couldn't afford to have many regression errors. Luckily, we had about 6500 unit tests and about 1700 functional tests running constantly in a continuous integration system. Having so many tests execute every time we changed something basically ensured that we don't have too many regression errors. It also considerably boosts the confidence level when we want to make a change to the application.

The first thing you will notice when upgrading to Spring 3 from Spring 2 is that it is packaged in a bunch of jars (or maven dependencies). This changed in Spring 2.5 and is generally regarded a good move although it makes upgrading a bit of a pain (even with maven). The good part is, the jar/dependency names are pretty self descriptive. It'll take a couple of hit and miss tries to get all the dependencies into the classpath.

Now, lets move on to using Spring 3.
API Compatibility:
Spring 3 is generally API compatible with the older versions. The only major difference you might find is your compiler might warn about generics. Older classes like JdbcTemplate got a generics refit to their APIs.
For example: The return values of methods like queryForList got changed to give a more meaningful signature.
Before:
List<Map> userDetails = jdbcTemplate.queryForList

After:
List<Map<String, Object>> userDetails = jdbcTemplate.queryForList

Apart from these, there are also a few minor API changes where
Example: AbstractBeanFactory.getMergedBeanDefinition(..) returns BeanDefinition instead of RootBeanDefinition.
Most of these API changes are minor and should be easy to resolve.

LifeCycle:
If you are using the any of these beans and you had any of these declared as lazy-init, you may be in for a small surprise.
DefaultMessageListenerContainer
GenericMessageEndpointManager
JmsMessageEndpointManager
SchedulerFactoryBean
SimpleMessageListenerContainer

These components extend a new SmartLifecycle interface and are eligible for autostartup. Previous versions of Spring lacked a nice way to start services in a given order. The depends-on attribute doesn't really help if there are a lot of components to be started, especially if they are spread a bunch of xml files. The Lifecycle interface provided a crude way of achieving ordered bean startup albeit with some manual coding. The Phase, SmartLifecycle interfaces take it a step further and give an ability to automatically start your services when Spring starts up. More on this in a later blog post.

A SmartLifeCycle bean can declare that it needs to be autoStarted (by returning true in the isAutoStartup() method). When the ApplicationContext starts, it looks for any SmartLifeCycle beans that are present in the context, and initializes all of them, even if they are marked as lazy. It then selectively starts the ones that are marked to be autoStarted. This means that if any of these beans are previously declared to be lazy-init, they will be initialized after application context has initialized. This is not really a bug as the isAutoStartup() method is available to Spring only after the bean has been initialized. The consolation if any is that these beans are initialized in a later phase, during the startup of SmartLifecycle beans.

In our application, we were using a few of these beans. While testing, we were counting on these beans not getting initialized due to the lazy-init attribute. But, with Spring 3 these beans were all getting initialized, which was undesirable. The correct resolution was to move these beans out of the bean context into a separate context xml file that is never loaded during testing.

We've been running Spring 3 in production for a little over 2 weeks now, with no issues. Overall, the Spring 3 upgrade was pretty painless and straightforward. I think the Spring team did a good job of trying to preserve API compatibility and to make sure that there were no regression errors in the APIs.

If you are using an older version of Spring, it is time to move to Spring 3.

Ofcourse, you should follow me on twitter.

Friday, March 5, 2010

The first Date sucked. Can a second Date fix it? JSR 310

Well technically, this is the third Date API. But, it forms a much better title!

JSR 310 is the new Date Time API specification. I'm overjoyed by the fact that Java will finally get the long overdue revamp to the infamous java.util.Date and java.util.Calendar classes. It is nice to see that JSR 310 builds on Joda Time API unlike some other APIs that didn't follow this approach (cough: java.util.logging).

The JSR 310 is in early draft review. And, they are looking for feedback about the API. So, I decided to take a crack at it.

There are a lot of things done right with this API. One of the nicest things is not having to mess around with the pesky Calendar.add(int,int) method. Among a lot of other things, JSR 310 introduces three new classes for date/time handling: LocalDate, LocalTime, LocalDateTime.

Now lets consider what could be better:

You got the money, now give me my constructors:
One thing I don't like is the absence of usable constructors in the three classes LocalDate, LocalTime, LocalDateTime. There are static factory methods that you can use to create the objects but having constructors for the simple usecases would be just nice.

Since LocalDate, LocalTime and LocalDateTime have similar interfaces, I'm going to pick on only one of them. The one that will take the beating is: LocalDate :-)

Compare the following:
Joda Time:
LocalDate date = new LocalDate(2010, 2, 20);

JSR 310:
LocalDate date = LocalDate.of(2010, 2, 20)
This may not be such a big deal to many folks, but I don't see why the constructors shouldn't be exposed especially given that this class is final!

Keep it simple 1: Printing Dates
Joda Time:
LocalDate date = ...
String dateString = date.toString("yyyy/MM/dd");

JSR 310:
LocalDate date = ...
DateTimeFormatter formatter = new DateTimeFormatterBuilder().appendPattern("yyyy/MM/dd").toFormatter()
String dateString = formatter.print(date);

Looks like we could use an overloaded method in the LocalDate class that works like Joda Time. I can imagine that it won't consider all the cases that DateTimeFormatterBuilder handles. But, it can at least address the most general usecase. I could be wrong, I haven't had a chance to use the API much. If so, please feel free to comment and let me know.

Keep it simple 2: Parsing Dates
There is the static method parse in LocalDate
LocalDate date = LocalDate.parse("2010-2-20");
The format that you have to pass it is fixed. For the same reasons as above, it'll be nice to see an overloaded method that takes a string format.

DateAdjusters is awesome:
Example:
LocalDate friday = ...
LocalDate monday = DateAdjusters.nextNonWeekendDay().adjustDate(friday);

Consider a case where we want to obtain the next valid working/custom day instead of the next non weekend day. DateAdjuster is an interface, we could write a custom implementation, but providing a method where developers can hook into the DateAdjusters class would be great.

LocalDate nextWorkingDate = DateAdjusters.nextWorkingDay(new WorkingDayDeterminer() {
public boolean isWorkingDay(LocalDate date) {
return ...;
}
}).adjustDate(friday);

So lets start using it:
Not so fast there. We typically use Date/Time with APIs like JDBC (does anyone use straight JDBC anymore?), Quartz etc. Unless these APIs upgrade to using JSR 310 classes, we won't reap full benefits of using these new classes.

Also, the original Date and Calendar classes are the IE6 of the Java API now. They must die! Or at least be deprecated, so the adoption for these new classes increases.

Finally, what is with the javax.time package name? Not that it bothers me too much, but java.time has an unmatched sex appeal to it.

Ofcourse, you should follow me on twitter.

Wednesday, May 27, 2009

Live Google IO notes

Effective GWT: Developing a complex, high-performance app with Google Web Toolkit

  • Effective GWT: Developing a complex, high-performance app with Google Web Toolkit
  • Lombardi Software - Blueprint
  • GWT What and Why
    • Generates optimized javascript (like escape analysis etc)
  • High Fidelity Mockup
    • Done in photoshop
    • More expensive
    • Finalize the icons and colors etc
  • Going to code
    • Be involved in the design
    • You need to know css and HTML DOM
    • What is the appropriate DOM structure
    • How to create and manipulate GWT
  • Design
    • Design outer layer with divs
    • Faster way is to do html panel and divs ? (what does that mean?)
    • DOM structure is created by GWT decorator panel
    • And, you can apply css on them
  • Handling Window Resizing
    • Goal is to handle browser window resizing
    • Static HTML you're limited to what you can achieve in css
    • Listen to ResizeEvent from window and propagate sizes down to children
    • Because they only do fixed sized row, they can do background images to do styling in table rows
  • Animation
    • Not all browsers do CSS3 ?
    • Helps users understand the behavior of application (provides visual feedback)
    • Done all in java in GWT
  • Original Implementation
    • Iterate through your objects, create widgets and add it to containers
    • Javascript - object creation and GC is expensive especially in IE6
  • New Implementation
    • Generate raw HTML in Javascript
    • Use flyweight pattern for event handling
    • They create html inside java (javascript) and do a DOM.setInnerHTML()
  • Event Handling
  • When All Else Fails
    • They dual compile code to Java and Javascript
    • If they find that the browser's javascript engine is slow, they render it on the server and sent to the client
    • So based on performance, they can dynamically move rendering between server and client
  • Compiling GWT code is slow
    • By default, GWT compiles code to 5 different browsers
    • You can tell GWT to compile code only for a single browser - locale, this speeds up development time
    • Well, you can run hosted mode and that never compiles :-) or use GWT 2.0 it never compiles and supports out of process hosted mode !
    • Instead of doing DOM manipulation over objects like Element.getStyle().setProperty('css property'), put that property in a css file
    • Checkout Episodes plugin from the creator of YSlow. It sends client performance numbers back to server.


This presentation is more of a war story. It deals with the Blueprint product. Because most of their clients run it in IE6, Lombadri had to go through extra steps to optimize their application to rely less on IE6's javascript engine. These techniques also apply when you have a really rich GWT application.

Transactions Across Datacenters (and Other Weekend Projects)
  • Consistency
    • Talked about Weak Consistency, Eventual Consistency (thanks to Werner for making this popular), Strong consistency (AppEngine datastore, File systems, RDBMSes, Azure tables)
  • Transactions
  • Why across datacenters?
    • Catastrophic failures, expected failures, routing maintenance, geo locality (CDN, edge caching etc)
    • Basically vertically partitioning your application
    • Packet roundtrip from west to east coast is 30ms
  • Why not across datacenters
    • Within a datacenter, it costs much lesser to communicate, low latency (1ms within rack, 1-5 ms across)
    • Outside datacenter
      • Expensive
      • High latency
  • Multihoming
    • ????
    • As soon as you write across multiple locations, you will have consistency problems
    • Realtime writes is always the hardest
    • Don't do it
    • A datacenter in silicon valley went down and twitter and friendfeed went down for more than 2 hrs. Both did not have multihoming
  • Option 2:
    • Better but not ideal
      • Have multiple datacenters, have primary and secondary
      • Mediocre at catastrophic failure
      • window of lost data because of asynchronous replication
    • Examples:
      • Amazon Web Services
      • Banks, Brokerages etc
    • Depending on systems, all your slaves can serve reads
  • Option 3: True Multihoming
    • Simultaneous writes in different data centers
    • Two way: hard
    • NASDAQ does 2 datacenters and does 2phase commit across them for transactions
    • Expensive and definitely slower
  • Techniques and Tradeoffs
    • Backups
      • Make a copy
      • Dog Fooding - they make other teams use their internal systems so they get to iterate and then release the API
    • Maser Slave replication
      • Usually asynchronous
        • Good for throughput, latency
      • Most RDBMSs do binary log based replication
      • AppEngine also follows this model.
      • AppEngine write is much slower than a relational db
      • But, it is geared for read more than write
    • Multi Master Replication
      • Support writes at multiple locations and then merge them
      • Asynchronous, eventual consistency (Amazon's shopping cart service does this)
      • You cannot rely on a global clock
      • Because of this, you cannot do global transactions
      • Another way of thinking about this is this is like mutlithreading without locks
    • Two Phase Commit
      • Heavyweight, synchronous, high latency
      • Semi distributed as there is a coordinator
    • Paxos
      • Fully distributed consensus protocol
      • No single master like 2PC
      • Still has longer latency
      • Gives a better throughput than 2PC
  • Paxos for the Datastore
    • Closer datacenter? not really because you are doing two round trips
    • Same datacenter? no
    • Opt In...
  • Paxos for AppEngine
    • They use that to coordinate when moving between datacenters
    • Use a lock server
    • Managing memcache
  • Conclusion
    • No silver bullet
    • Embracing tradeoffs
    • Consistency is app driven, the platform cannot make that choice.
    • AppEngine is going to support options in consistency models in future (Nice)



Building Scalable Complex apps on AppEngine:

  • List Property
    • Property has multiple values
    • Maintains it's order
    • Queried with an equals filter
    • Densely pack information instead of denormalizing it
    • Cut across all data and query on one of the values in the list property
    • select * from FavoriteColors where color = 'yellow' where color is a list property
    • Saves space to use list property
    • Uses more CPU to serialize and deserialize the list property
    • Never have composite index between two list properties because it creates a cartesian product index
  • Concrete Example: Microblogging
    • Fanout of messages can be inefficient in terms of space
    • Message sending by reference
    • You would use list properties instead of joins
    • select * from messages where receiver = 'user'
  • Problem with List Property
    • selects load all of the list properties
  • Relational Index Entity
    • Split the message into two entities (message index and message)
    • We put them into same entity group and make message index a child of the message
    • There is a key only query it lets you fetch just the fetch
    • Reads are 10 times faster and cheaper than with just plain list properties
  • Merge Join
    • AppEngine supports self joins
    • Data mining like operations
    • Don't have to build indexes in advance before this query
    • Can be used to test set membership
  • How does Merge Join work?
    • Because they don't have histograms (RDBMSs use histograms to make a query plan)
    • They store all property indexes in sorted order
    • Uses zigzag algorithm
    • If we are using 2 filters, it scans the first property to find a match, then moves the second one to find a match. Then, if the keys don't match, it moves the first one until both the property and key match
    • select * from animal where legs = 4 and type = 'cow'
    • Scales with number of filters
    • Can't apply sort orders - must sort in memory

Keynote 2

  • New Google Product Google Wave
    • Platform
    • Product
    • Protocol
  • A wave is a conversation between multiple people.
  • Wave can be viewed has an enhanced twitter or a hybrid (e-mail, IM, and word document)
    • Comment Support - Allows for inline comments or your typical comments at the end of the wave/posting.
    • Real-time updates from others - The document changes real-time while others are updating it.
    • Spell Check - A very sweet inline real-time spell checking and takes the word context into account. i.e., "Can I have some been soup" and it offered the follow "Can I have some bean soup"
    • Play back changes - You can play back the changes in the conversation to see what was done and in what order. Very much like playing back a video.
    • Private replies - Supports for private conversations between users hidden from others on the wave.
    • Drag and Drop - Wave supports D&D from iPhoto.
    • Uses Google Contacts - Integrates with the your GMail and GTalk contacts
    • Wave cloning - Wave allows for cloning of an existing Wave to create a new wave. Reminds me of Git :)
      • When this occurs, all subscribers or people in the wave are notified.
    • Inline editing - Supports inline editing from Wave and external websites that are using the Wave plug-ins.
      • When changes are made, the document is marked up to reflect where the changes were done, ONLY from the last time *you* viewed it.
    • Collaboration Editing - Awesome support for changing content in the same document in the same area and changes are reflected in different colors.
    • Open Social Integration
      • Any open social app can live inside wave
    • Developer API -
      • A demo was given of a custom widget that allows a user to vote "Yes, Maybe, No"
      • Sudoku Widget that allows multiple players to play with one another.
      • Chess Widget that allows others to play one another and uses the playback feature. Nice integration.
      • Google Maps Widget that shows all other users where you're looking. Also, draw regions, add pins and more! Very sweet!!
    • Search - You can search your contacts or use their built in Google search to search the web and actually use the results in the document. i.e., search for an image and select it to embed it into the document.
    • Multiple languages - Supports multiple languages.
      • Real-time language translation using a program called "Rosy". This was very sweet!
    • Polls - A nice little extension allows for creating polls.
    • System Federation between Wave 'systems'
      • Wave systems can collaborate between one another.
      • Private Waves between people within the same 'wave system' are never sent to other wave servers in the federation.
  • Forms are native to Wave
  • They are going to Open Source the 'lion' share of the code.
  • Written in GWT and HTML5
  • Demos
    • There was a demo of dragging and dropping a file into the browser to create an attachment. This is not supported in HTML5 yet, it is a prototype.
    • A nice demo of how the API is used to write an external application like a blog. Very cool demo.
    • Orkut demo with using their embedded API.
    • Nice twitter integration that signs in to twitter and actually will post tweets.
    • Very sweet code.google.com integration tool.
  • All attendees at Google I/O will obtain an account to use Google Wave before it's released. Hence, come see Google IO. Did I mention we also got a free Cell phone with a full month of unlimited service?
  • Comes with a developers API (We're talking about Google, it's expected! :)
  • Minor bug occurred when doing the demo. Hey, we're talking about live demos (turned out to be a wrong configured browser proxy).
  • Google Wave is great for team collaboration by adding inline comments, embed images, viewing changes, live changes, remove the conversation nose and release a final product and more.

  • Open Social Integration
    • Any open social app can live inside wave.
  • Website URLs
    • http://wave.google.com - Main website for Wave
    • http://code.google.com/apis/wave - API website
    • http://waveprotocol.org - The protocol website

Offline Processing on App Engine: A Look Ahead

By: Brett
Live Notes by @dushyanth

  • Motivation
    • AppEngine is great for request based database backed applications
    • Cron is good for periodic jobs, but not good enough
  • Problems with Polling
    • Wasted work as it is not event driven
    • Workers stay resident when there is no work wasting resources
    • Fixed number of workers. Or admins must manually add workers
    • Limited amount of optimization possible
      • Long lived hanging connectons
    • Existing task queue like systems
        • MQ, Amazon SQS, Azure Queues, Starling (getting popular these days)
    • Task Queue API
      • Part of AppEngine Labs (API may change until it graduates from Labs)
      • Asynchronous execution for a first in first out queue.
      • If execution fails, work will be retried until successful
      • Tasks are light weight to store. They are 3 times faster than storing in the datastore.
      • Tasks are scalable. The tasks can be started across a lot of machines.
      • Implements queueing. NOT pub-sub
      • Goals: High throughput, maximizing data throughput
      • Pushes tasks to the app. No polling
      • Uses Web hooks (It is a RESTful push-based interface for doing work)
      • Task is submitted as a web hook. If you get a 200 back, it succeeds.
      • Essentially combines queuing over REST.
      • Integrated into admin console as normal requests
      • Supports config driven throttling
        • Can be used to Prevent web services (external) from getting overloaded
        • Stay inside budget per hour etc
    • How task Queue Works
      • Tasks enqueue in a queue
      • Queue Moderator pulls from the head of the queue
      • It submits the task to the workers. Queue Moderator has capability to create new workers (threads).
      • Max number of threads depends on throughput
      • When a task is submitted, it could be running even before the enqueue request API call returns :-)
    • EdgeCases
      • Tasks have to be idempotent
      • Possible for a task to spuriously run twice even without failures.
      • You could use memcache or database to avoid it running twice, but that responsibility is on the developer
    • Working with TaskQueues
      • Each task added to a single queue
      • You can create multiple queues per application
      • Working with ETA (Estimated time of Arrival)
        • How long until the task is executed
        • Different than "visibility timeouts" in other systems
      • Working with tasks: Names
        • Tasks can be named. If a task is not named, it is auto generated
      • Prevents tasks from accidentally being submitted multiple times
    • Concrete Example: Write behind cache
      • Minimizes writes with repeated cache flushing
        • Write new data to cache
        • Periodically read cache and persist to disk
      • To implement, user submits data to cache and a task to task queue
      • When the task queue is processed, task is dispatched and the task does a periodic read from the cache and writes to the datastore. Essentially using the TaskQueue as an executor
    • Python only at first. Java comes next
    • Java support in the works
      • Webhooks, JMS
    • The Future
      • Batch Processing
        • Task Queue is good for small daasets (<100k>
        • More tools needed for parallelization
      • Map Reduce in future
        • Eventually
        • Want it to work with small and large (Terabyte scale) datasets


    The Softer Side of Schemas - Mapping Java Persistence Standards to the Google App Engine Datastore

    Live Notes by @dushyanth

    • Datastore is
      • Transactional
      • Natively Partitioned - developer does not have to worry about scaling
      • Hierarchical - every entity can have a notion of parent
      • Schemaless - no restricted structure
      • Based on BigTable
      • Not a relational database
      • Not a SQL Engine
    • Simplifying Storage
      • Simplifies
        • Development
        • Management of applications
      • Scale always matter
        • Request volume
        • Data volume
    • Datastore Storage Model
      • Entity consists of
        • Kind
        • Key
        • Entity Group
        • 0..n properties
        • If entity group == key, the entity is a parent
      • Heterogeneous property types. Properties can be of different types in different entities
      • Supports multi valued properties
      • Variable property - Having the same properties between entities is not needed
      • Soft Schema
        • It is a schema whose constraints are enforced only in the application layer
        • Simpler development process
          • Rapid typesafe prototype
        • Can be enforced by JDO or JPA metadata mappings
    • Transactions
      • Only transact within an entity
    • Relationship Management
      • JDO and JPA are not just about object relationships
        • Transparent persistence
        • Object view of your data
        • Centralized mapping
        • Big maintainability win
        • AppEngine decides and manages which entity group the entity belongs to
        • Uses ownership to enfore entity group colocation
    • Future JDO/JPA work
      • Support unowned relationships
    • Bringing existing code to App Engine
      • Datastore is not a drop in replacement for RDBMS
      • Plan for data migration
      • Primary Keys
        • Single property keys: Straight forward way to map single property keys
        • Composite keys:Can map to ancestor chain
        • Mapping table: Can be represented using multi-value properties
          • And can be queried with set memebership
      • Transactions
        • Identify roots in the data model
        • Identify operations that transact on multiple roots
        • Analyze impact of partial success
          • Refactor
          • Run compensating logic
      • Queries
        • Shift processing from reads to writes.
          • Denormalize
          • Expensive write and cheap reads


    Google Wave Client: Powered by GWT

    Live Notes by @dushyanth
    • Wave UI requirements
      • It got to be fast
      • Stunning
      • Optimistic UI
    • JSNI
      • Java can call javascript
    • Client Architecture
      • Bidirectional communication channel - keep alive http
      • Protocol Compiler
        • Generates interfaces, client + server implementations
    • GWT
      • Code Heavy
        • Can use UIBinder to plop GWT components into html
      • Most bugs are from CSS
        • Style Injector + CssResource
        • Looks like Minification + Image Spriting is done by GWT
        • Allows modularization of CSS
        • Different CSS for different browsers
      • Inefficient JSON handling
        • JSO - Javascript object structure
        • Didn't quite get it
      • Hosted mode isn't quite browser like
        • OOPHM (Out Of Process Hosted Mode) - Browser plugin to debug in eclipse
      • Download Size
        • runAsync(dynamic loading of code)
        • Download lazily
      • No transparency between javascript and java
        • SOYC (Story of your compile) reports
        • Java package to javascript breakdown report
      • JSOs cannot implement interfaces
        • SingleJsoImpl
        • In order to inline, JSOs cannot have polymorphic dispatch
        • Atmost one JSO class being implementing one interface
    • Improving Gears
      • Client side thumbnailing
        • They create a thumbnail using the workerpool before uploading the image to server.
      • Desktop drag n drop
      • Resumable uploading
    • Performance
      • Startup
        • runAsync
        • fast start
        • inline images + css
        • smaller download
        • stats collection
        • server-side script selection
          • Server sends down the correct javascript + css files based on http headers
      • Loaded Client
        • Optimistic UI (trying to guess what the user will click next)
        • Prefetching
        • Flyweight pattern
        • Rendering tricks
    • Mobile Client
      • Deferred binding saves the day
      • iPhone browser is always running
      • It loads faster than native apps
    • Testing
      • Use Model View Presenter design pattern - how is it different from MVC?
      • Prefer JUnit tests over GWTTestCase
      • Browser automation - WebDriver
      • Web driver is a developer focused tool for browser automation
      • Has native keyboard and mouse events, rather than synthesised via JS
      • iPhone Driver - automated testing on iPhone
      • Remote Web Driver - so web testing can be farmed out into a grid