Thursday, August 20, 2015

JMH for Java MicroBenchmarks

Java Micro-benchmark Harness is an open-jdk project. As it says in the project home page, its a benchmarking harness that help you build, run and analyse results of benchmarks written in Java and other JVM languages.
Writing benchmarks I think is a good tool to have in every developer's arsenal. To an extent it's akin to Unit tests -> Functionality, but for performance. We want to know how our code is performing, but we are very bad at predicting how well it will run. So what other way to do this, other than fact and evidence based tests that are benchmarks. Benchmarks are great. But there's a dark side to this as well.
I love writing and running benchmarks. It gives some sense of affirmation on how your code or even some library performs. But the bitter truth and the dark side about benchmarks or any performance test for that matter is that they lie. There's no guarantee on the results you get on your benchmarks due to many factors. These include; Environment, Optimisations (JVM/Compiler), How CPUs work (i.e.  cpu cache misses and memory latencies). And overcoming challenges posed by these factors are really hard. Sure you can ignore these but then everything's nothing but a lie.

It's not all doom-and-gloom though. These challenges are the very reason JMH has been built. Understanding these challenges will make you appreciate it even more. JMH uses different strategies to overcome these challenges and provides a nice set of annotations to use in our benchmark code.

The best way to get started with JMH is to go through the examples that are provided in the project home page. They can be found here.

For the sake of this write up and to highlight some of the key concepts, following is a very basic example of adding an item to a CopyOnWriteArrayList.

public class CopyOnArrayListBenchmark {

    private List<String> benchMarkingList;

    public void setup() {
        benchMarkingList = new CopyOnWriteArrayList<String>();
    public void benchMarkArrayListAddStrings(Blackhole blackhole) {

    public static void main(String[] args) throws RunnerException {
        Options options = new OptionsBuilder()
                .warmupIterations(5) //

        new Runner(options).run();

The annotations and the main method makes this a JMH benchmark. There's lot more that JMH provides but this I feel is a very basic benchmark that covers some important points. A quick run-down of the annotations used;

This is an annotation that is useful to define state for the benchmark. Generally we want to maintain some sort of state in our benchmarks. The different scopes available are;

  • Thread - A benchmark gets its own object, field per thread
  • Group - A benchmark shares objects, fields amongst a thread group
  • Benchmark - A benchmark shares the objects, fields for the whole benchmark

The way the state is defined in this example is by setting a default state. Therefore the instances fields of the benchmark have the state characteristics. The project examples show how per class states can be achieved.

This is an annotation I find very useful. It's like the @Before annotation we'd normally use in JUnit tests. This gives us the opportunity to initialise fields, objects without impacting the actual benchmark. I find this very useful since, if not for this, we'd use various techniques to initialise objects and inadvertently incur setup cost in the benchmarking we do.

The method annotated with this is the actual benchmark. The results captured will be for invoking code that is executed in this method. An important concept used in this method is the Blackhole. The simple reason for this is to avoid dead code. Dead code elimination is one of the key challenges in benchmarks (this caused by compilers running optimisations). JMH provides infrastructure to eliminate/minimise dead code. In the example above, I could have simply returned the boolean resulted by the List.add(T) method. Returning a value from a benchmark tells JMH to limit dead-code-elimination. However if multiple values need to be returned, its possible we compilers might find the code not returning to be dead-code. This is where Blackholes come in handy. Simply by using Blackhole.consume, we sink the values where JMH helps with dead-code. The example does not need a Blackhole (as it returns one value). It's only shown for highlighting this feature.

And then we have the main method which sets up the benchmark to run. The key highlights here is the various options it provides. I've used a few frequent options I use, but there's more. These options also have annotations that can be used.

JMH provides many benefits. However following are the benefits that stood out for me and got me using it more and more.

  • Optimisation proof/aware:
Does a lot to address dead-code elimination, constants-folding, loop unrolling
  • False sharing
There have been many mentions about how false sharing is a silent performance killer. JMH has recognised this and has built-in support to prevent it.
  • Forking
Given how well JVMs optimising for profiling, forking is important when running benchmarks. Running the benchmark in multiple forks helps eliminate these issues and therefore JMH forks tests by default. We however have the option of saying how many forks we want.
  • Data setup
I have found many times writing benchmarks and being worried about initialisation time contributing to my benchmark. The @Setup annotation does this job and is really useful.
  • Warming up
Warm-ups are crucial for benchmarks. Results gathered after running a test vary (sometimes significantly) depending on how many times they are run. This is another instances where optimisations get in the way of benchmarks. I have found needing to loop-run the code under test several times for the purpose of warming up so I can get a more sensible result. This is cumbersome and plus running benchmarks within loops have their own problems (Loop unrolling). So the ability to just mark the number of warm-ups we need is pretty cool.
  • Running tests multiple times
Like warm-ups, the option to set the number of measurement iterations are pretty cool as well. If not for this, we'd typically run the test in a loop.
  • Support for threads
Running benchmarks across multiple threads is a lot of hard work. I have used the technique of barriers to get all threads prepped and then getting them to all run. This still doesn't give us an enormous amount of confidence that all threads are in fact running as we hope they'd run. Thread scheduling is not something we deal with. Plus with all Runnable wrappers and cyclic barriers that might be used, makes the benchmark ugly, hard to read and error-prone. So the option in JMH of just saying how many threads I want to the test to run is just awesome.
  • State per thread/benchmark
Maintaining state during benchmarks can be very tricky. Specially when it comes to tests that we want to run on multiple threads. So the @State annotation comes very handy and makes life so much better.
  • Parameters
This I find is a another super cool feature of JMH. Just by using a @Param annotation and specifying an array of parameters, the test is repeated for all parameters given. This is a very nice and concise way to declare the scope of the parameters under test. A nice example can be found here
  • Artificially consume CPU
Sometimes when we run run a benchmark across multiple threads we also want to burn some cpu cycles to simulate CPU business when running our code. This can't be a Thread.sleep as we really want to burn cpu. The Blackhole.consumeCPU(long) gives us the capability to do this.
  • Measurement modes
Gathering results for benchmarks can be for throughput or latency (or both at times). Capturing results and getting them to produce the percentiles requires careful thought as well. So JMH provides a very convenient way to set the measurement mode. 
  • Built in profilers (Not your fully blown profiler. Use it with care)
JMH has this nice little feature where you can run a StackTrace or GC profiler. They are not your commercial profilers, but yet a good way to get some sense of indication of which part of the code is taken the most time and what kind of GC activity takes place when running the tests. I find these profilers a nice starting point when running benchmarks to further dive into a fully fledged profiler.

As stated above, the examples are a great way to get cracking on writing benchmarks using JMH. Also I found the resources by Aleksey ShipilĂ«v (the guy who's responsible for JMH) to be very useful. He goes into detail about the challenges and how JMH solves them in this video.

As closing remarks its still worth noting that benchmarks are should not be taken as absolutes. They are an indication and close representation on the throughput/latency of the code under test. I find it very useful as a relative measure when I refactor code. So write benchmarks, run them, analyse results and do it regularly.

Tuesday, May 31, 2011

Maven Releases - To Run or Not to Run (tests)

Firstly, I think Maven is great (yeah I said it). Yes it can be a bit of a pain in the back side at times, but who/what isn't. Now since I've got that out of my chest, a quick look at maven releases. Specifically running tests during maven releases.

Maven has a release plugin that can be used to produce and publish release artifacts. When you have a maven based project, simply running;

mvn release:prepare release:perform

Will present a series of questions that would then end up producing and publishing release artifiacts for your project(s). Leaving the details out of the 'prepare' and 'perform' phases out of this write-up (as it is sufficiently documented), this post will dwell on running tests during these releases.

When one runs the above, maven would run all tests by default. This is because both phases runs golas that executes tests. 'perform' by default executes 'clean verify' goals while 'perform' by default executes 'deploy'. This means if you have 100s of tests, they will run twice.

While this seems completely normal, it can be argued that once a preparation has been done, i.e. All source code compiled, tests passed, scm tags created, it really isn't necessary to run the same tests again. While there are compelling reasons as to why you still want to run tests, this was one of the things that was bugging us when we run maven releases. Specially when you run preparation and perform together, it seems reasonable to be able to avoid the tests for the second time. When a project has 100s of integration/functional tests that gets executed as part of a release, this means a lot of time spent in running tests that we already know that passed.

And it so happens the release plugin does provide the capability to avoid tests running during the perform phase if one wants to do so. It's all down to the plugin configuration.

        deploy -Dmaven.test.skip=true

The above configuration would make sure that tests don't run as part of the 'perform' phase of your maven release, and there by saving you as many minutes it takes to run the tests. While this might not be a preferred choice (not running tests in the perform phase), the plugin is configurable for us to make it play how we want it to play in case the second test run is seen as avoidable.

Thursday, February 24, 2011

Multiple Web Applications - One Spring Context

Its normal to have multiple web applications deployed as a complete solution that serve a product or enterprise. And if these web applications use spring, its normal to see multiple spring contexts associated to each of these web applications. This is because although spring beans are in fact singletons, they are not "single"ton per vm under multiple class-loaders. When multiple web applications are deployed (typically in a J2EE container or servlet container) each web application has its own class-loader. And when spring beans under each of these web applications get loaded, they are singletons within the class-loader.

While this is perfectly acceptable, it would be nice (and beneficial) to have these spring beans to be singletons for all web applications. This would mean that the web applications would be sharing the same spring context. Spring does provides this capability out of the box and is easily configurable.

One of the pre-requisite to enable this capability is however to have all web applications packaged and deployed as a EAR. This means the deployment container would need to be a J2EE container (like WebLogic, JBoss, Glassfish etc) and not simply a servlet container like Tomcat.

The key to this deployment model is the class-loader hierarchy that we get from a EAR deployment. An EAR deployment which would typically have multiple web applications (WAR files) and shared application library (with multiple JAR files) will work on a class-loader hierarchy. Below the standard system/bootstrap class-loaders for application servers, an EAR would have a top-level class-loader (say this is the Application Class-loader) and a bunch of class-loaders as children. These child class-loaders are associated to the web applications. And its standard in a EAR deployment to package all jar files that are shareable by all web applications under a directory to which the class-path is set by the META-INF file. All classes loaded by the Application class-loader are visible to the web application class-loaders. But if a web application contains any jar file under its own lib directory, they wont be accessible to the Application class-loader and certainly not to the other web applications within the EAR.

So with regards to sharing spring beans, this means that we can place jar files for all spring beans in the shared application library. While these then get loaded by the Application class-loader, each web application will have access to them hence resulting a shared spring context - Not really. For spring, this is still not complete to achieve a shared context.
A web application is spring configured through its web.xml either using a ContextLoaderListener or a ContextLoaderServlet (depending on the servelt API implemented by your container). Typically we'd use the 'contextConfigLocation' where we specify the location of our spring bean configurations.


Assuming the above spring configuration consists of beans specific to the web application concern (validators, controllers), to have bunch of beans share a single spring context, we use the 'locatorFactorySelector' and 'parentContextKey'. We simply add the following into our web.xml(s)



The above would mean that you would have a file called common-beans.xml in the classpath for the web application, which has the following bean configured;


The above configuration defines the 'commonContext' bean. This is a bean representing the 'ClassPathXmlApplicationContext', which is a convenience class used to build application contexts. The list passed into the constructor is a list of configuration file locations, of which the beans will be loaded from the definitions given in the configuration files.
With a configuration like the one above in each web application of a EAR, all web applications will share the same beans configured through the 'commonContext' bean.

This approach can bring few advantages on your deployment architecture, maintenance and development;

From a deployment architecture point of view,
  • If using hibernate in the mix of things, we can benefit from a single SessionFactory. If caching is configured, the cache would be applicable for all web applications and it would be one cache. Saving your heap usage.
  • Reduces the classes to load and hence saving on your permgen.

From a maintenance and development point of view,
  • Common beans can be configured in one place one configuration file that would be used by others. There's no need to duplicate the bean declarations in multiple spring configuration files.

As mentioned previously, this is only if the deployment model is EAR based. If all web applications are deployed as their own WAR files, there's not concept of class-loader hierarchy to achieve the above.

While the decision on whether to EAR or Not is a separate set of notes, if an EAR packaging method is decided for an application, spring does provide the capability reap benefits from the decision.

Useful links on the subject;

Saturday, January 29, 2011

Spring transactions readOnly - What's it all about

When using spring transactions, it's stated that using 'readOnly' provides the underlying data layers to perform optimisations.
When using spring transactions in conjunction with hibernate and using the HibernateTransaction manager, this translates to optimisation applied on the hibernate Session. When persisting data, a hibernate session works based on the FlushMode set on the session. A FlushMode defines the flushing strategy that synchronizes database state with session state.
We look at a class with transactions demarcated as follows
@Transactional(propagation = Propagation.REQUIRED)
public class FooService {
    public void doFoo(){
        //doing foo

    public void doBar() {
        // doing bar
The class Foo is demarcated with a transaction boundary. And all operations in Foo will have the transactional attributes specified by the @Transactional annotation at the class level. An operation within a transaction (or starting a transaction) would set the session FlushMode to AUTO and is also identified as a read-write transaction. This would mean, that the session state is sometimes flushed to make sure the transaction doesn't suffer from stale state.
However, if in the above example, we'd have the doBar() simply performing some read operations through hibernate, we wouldn't want hibernate trying to flush the session state. And the way to tell hibernate not to do this is through the FlushMode. In this instance the above example turns out as follows
@Transactional(propagation = Propagation.REQUIRED)
public class FooService {
    public void doFoo() {
        // do foo

    @Transactional (readOnly = true)
    public void doBar() {
        // do bar 
The above change in doBar() forces the session FlushMode to be set to NEVER. This would mean that we wont have hibernate trying to synchronise the session state within the scope of the session used in this method. After all it would be a waste to perform session synchronisation on a read operation. One thing to note in this configuration is that we are indeed spawning a new transaction. This happens by applying the @Transactional annotation.
However, this is only true if doBar() is called from a client who has not initiated or participated in a transaction. In other words, if doBar() is called within doFoo() (which has started a transaction), then the readOnly aspect wouldn't have any affect on the FlushMode. This is due to the fact that @Transactional uses Propagation.REQUIRED as the default propagation strategy and in this instance it would participate in the same transaction started by doFoo(). Thereby not overriding any of its transaction attributes. 
If for some reason doBar() needs to still have readOnly applied within an existing read-write transaction, then the propagation strategy for doBar() would need to be set to Propagation.REQUIRES_NEW. This forces the existing transaction to be suspended, and create a new transaction, which also sets the FlushMode to NEVER. However once it exists the new transaction, it would continue the first transaction and would also have the FlushMode set to AUTO (Following the transaction propagation model). However, I cant think of a scenario which would need such a configuration though.

While the readOnly attribute can also provide hints to underlying jdbc drivers, where supported, the implications of this attribute can vary based on the underlying persistence framework (Hibernate, Spring JPA, Toplink or even raw JDBC) in use. The above details aren't necessarily true for all approaches, it's only valid in the spring+hibernate land.

Another way of looking at all of this is probably asking as to 'Do we really need to spawn a transaction and then mark it as read-only for a pure read operation at all'? The answer is probably 'No'. And 'Yes' its best to avoid transactions for pure read operations. However this can come handy in the approach one picks to demarcate transactions. For example take a service implementation that has its transactional properties defined at a class level, which is configured to be read-write transactions. If majority of the operations of this service implementation shares these transactional properties, it makes sense to have these properties applied at a class level. Now if there are couple of operations that are actually read operations, the readOnly attribute comes handy in configuring only those operations as @Transactional (readOnly = true). This is still not perfect in the premise of creating a transaction for a read operation. So on this example another configuration might be @Transactional (propagation = Propagation.SUPPORTS, readOnly = true), which would mean a new transaction will not be created if one does not exist (relatively more efficient) and also has the readOnly applied for other optimisations (possibly on the drivers).

It all boils down to the fact that spring has the readOnly attributes as an option to use. But, the when where and why to use it is entirely up to the developer depending on the design to which an application is built. So it is quite useful to know what the 'readOnly' attribute is all about to put it to proper use.

Monday, January 24, 2011

Too many open files

I've had my share of dreaded moments of the 'Too many open files' exception. When you see '[]: Too many open files' it can be somewhat tricky finding out what exactly is causing it. But not so tricky if you know what needs to be done to get to the root cause.

First up though what's this exception really telling us?
When a file/socket is accessed in linux, a file descriptor is created for the process that deals with the operation. This information is available under /proc/process_id/fd. The number of file descriptors allowed are however restricted. Now if a file/socket is accessed, and the stream used to access the file/socket is not closed properly, then we run the danger of exhausting the limit of open files. This when we see 'Too many open files' cropping up in our logs.
However the fix to the root cause will vary from what's been uncovered. It's easy if it's an error made in your code base (simply because you can fix your own code easily - at least in theory)  and harder if it's a third-party library or worse the jdk (not so worse if its documented though).

So what do we do when this is upon us?
As with any other thing, find the root cause and cure it.
In order to find the root cause in relation to this exception, the first thing that would be nice to find out is, what files are opened and how many of them are opened to cause the exception. The 'lsof' command is your friend here.
shell>lsof -p [process_id]
(To state the obvious the process id is the pid of your java process)
The output of the above could be 'grep'd to find out what files are repeated and is increasing as the application runs.

Once we know the file(s), and if that file/socket is accessed within our code its a no brainer. Go fix the code. This could be something simple as not closing a stream properly like so;
public void doFoo() {
    Properties props = new Properties();
The stream above is not closed. This would result in a open file handle against filt.txt.
If its third-party code, and you have access to the code, you have put your self through the some-what painful process of finding out the buggy code in order to work out what can be done about it.
In some situations third party software would require increasing the hard and soft limits applicable to the number of file descriptors. This can be done at /etc/conf/limits.conf by changing the figures for hard and soft values like so;
*   hard   nofile   1024
*   soft   nofile   1024
Usage of '*' is best replaced by user:group for which the change is applicable to.
Some helpful information about lsof

Saturday, September 11, 2010

Not one thread, but many threads to do my work

It's often the case where our code executes in a single thread of execution. Take for an example a web application running on a servlet container. Sure the container is multi-threaded, but as far as one request is concerned, it's one thread that executes all code in its path to complete the request.
This is the norm and is usually sufficient for most cases. Then there are certain use cases where once the application is put together (or during designing it, if one pays good attention to design that is) you feel that part of the logic is simply waiting for their turn of the thread execution although they are perfectly capable in running without any dependency.

Take for an example a search solution that searches multiple data sources in order to present the results to the end-user. Assume that the data sources are one or more relational databases, an index of some sort (think lucene) an external search service. The search solution is expected to take in a user search term and search across all of these data sources and provide an aggregated set of results. And that's exactly what the code would do. The code would search each data source at a time, and say perhaps wrap the results in a common value object and would aggregate them so that they are ready for the user. So in the typical single thread execution path, the thread would execute each search one at a time and finally run them through the aggregation. Concurrently speaking though, this seems a bit wasteful. Each search against a data source has no dependency to each other. The only element that's dependent in this execution path is the aggregation which needs all results for it to do its magic. This means if we run all searches concurrently and then pass them to the aggregator we have a good chance of saving some time.
First thing that would dawn upon about this approach is - multiple threads. We want the search to spawn multiple threads on which each of the data sources would be searched on, and pass the results to an aggregation logic. Hence achieving concurrent execution of searches of which benefits can be very noticeable.

Messing around with multiple threads however, is something most avoid doing due to inherent complexities. Thread pooling, thread termination, synchronisation, deadlocks, resource sharing are a few of those complexities that a developer focused on application development might tend to stay away. But with the inclusion of the java concurrent package in the JDK, things just got simpler. Thanks to Doug Lea, the java.util.concurrent package which includes out of the box thread pooling, queues, mutexes, efficient thread initialisation and destruction, efficient locking etc are basically a comprehensive set of building blocks for application developers to easily introduce concurrency into the application flow.

While the above use case is an ideal candidate for an application developer to adopt a concurrent execution model in the application, a simple framework at - is a simple usage of the API on how to achieve concurrency. This was put together to achieve concurrent tasks in various use cases in a recent project. The code is quite straight forward and a test provided along with this is doing nothing but computing a fibonaci series and factorials. The computations however are exhausted so that its heavy in response times.

Touching the gist of it, the main building blocks employed of the java.util.concurrent package are briefly discussed here.

An ExecutorService provides means of initiating, managing and shutting down threads. The jdk provides several implementations of this and a factory utility called Executors provides the necessary utility methods to create them. The API documentation of the different types of ExecutorService is a good point of references (as always). Typically you would gain access to an ExecutorService will be as follows;

ExecutorService execService = Executors.newCachedThreadPool(); 
This would give you an ExecutorService which has a cached thread pool.

The Callable interface is similar to the Runnable interface (in the thread api). So implementors of this interface can run on a different thread. The differentiator from the Runnable is that Callable will return a value and may throw an exception (if there's one to be thrown ofcourse).

public class Processor implements Callable {
   // implementing the contracted methods
    public AsyncResult call() throws ConcurrentException {
        // implementaitons placed here.

Like the run() method of the Runnable interface, the Callable interface has a call() method that implementors are contracted to implement. As per the example above, our implementation returns a AsyncResult type when call() is called. And the call() method is invoked by the ExecutorService. And this implementation can be thought of a task that you'd want to be executed in its own thread.

Putting all together:
Now we've got our task that we want to be invoked in a separate thread (The Callable implementation) and we've got an ExecutorService that dishes our threads and manages the thread execution. Now to kick it all off all we have to do is;

List  tasks = // build a new list of Processor objects
List results = execService.invokeAll(tasks);

The invokeAll(Collection) method takes care of submitting all tasks (Processor objects in our case) for execution across multiple threads concurrently. As mentioned above, the call() method is consulted to kick off the execution of each of those threads.

The next important thing about the invokeAll(Collection) method call is its return value - Future objects. A Future is an holder of the result for an asynchronous process. This is quite a powerful concept where we can now take control of result of multiple threads. A simple get() method provides the means retrieving the desired result. As listed above, the type of the result is what the call() method returns. The get() method however would block if necessary. Thus the result of each task is accessible and can be put to use there after.

Putting all together:
A quick look how all of these lines in respect to the main thread of execution and the multiple threads spawned by the ExecutorService

List  tasks = // build a new list of Processor objects // main thread
List results = execService.invokeAll(tasks); // main thread spawns multiple thread

for (Future<AsyncResult> result : results) {
    AysncResult res = result.get() // main thread. Would block if necessary
    // do something with the result 
That's it. Using these building blocks from the API we have a nice straight-forward way to invoke our work concurrently. While the usage of the API is simple, applying concurrency needs to be done carefully. It needs to be thought of while designing the application and testing in terms of load, stress for thread and resource utilisation is a must. Also once implemented, a run through a profiler for how each thread behaves should be in the list of things todo as well. Also worth noting is that the tooling support is not quite mature as yet for the concurrent programming model (i.e. debugging).

The example contains a test that demonstrates the usage of the simple framework and it's accessible here - The test in the example shows the perhaps the most easiest advantage on using the java.util.concurrent package, which is speed-up through parallelism.

Tuesday, March 16, 2010

ThreadLocal - What's it for

ThreadLocal provides a mechanism of storing objects for the current execution of thread. What this means is that you can stick in a object into a ThreadLocal and expect it to be available at any point during the the executing thread. This comes in handy when you have to make some object made available between different layers (i.e. presentation->business->data access) without making them part of the api. For example implementing cross cutting concerns such as security, transactions make use of the ThreadLocal to access transactional, security contexts between layers, without cluttering the application APIs with non-application specific details.

What happens in the ThreadLocal
ThreadLocal holds a custom hash map that holds objects as values and the current thread as the key. This custom map (ThreadLocalMap) is local to the ThreadLocal and is not exposed as part of the API. So it's basically a HashMap carried through the current thread of execution. That's all to it in a nut-shell as to what goes inside the ThreadLocal.

How to make use of a ThreadLocal
A ThreadLocal is generally used through a custom implementation which holds the ThreadLocal. The custom implementation should have a static field of the type ThreadLocal and would generally expose static accessors to store and retrieve objects to and from the ThreadLocal.
Assume you want to store a FooContext object in the ThreadLocal as part of your application.
Following is an example of a basic implementation to use the ThreadLocal;

public class ThreadLocalHolder {

    private static ThreadLocal store = new ThreadLocal();

    public static void set(FooContext context) {

    public static FooContext get() {

To see how this is used consider a Business layer object called FooBusinessObject and a DAO callled BarDao. You want some method in the business object to initialise a FooContext, do some work and call the DAO. You then want to be able to access the FooContext you created without making part of the API of the DAO. 

public class FooBusinessObject {

    private BarDao barDao;

    public void doFoo() {
        // Some logic that builds a FooContext
        FooContext context = new FooContext();

        // sticks the context into the ThreadLocal.

        // Do more work....

public class BarDao {

    public void processBar() {
        // Get the context from the thread-local
        FooContext context = ThreadLocalHolder.get();

        // do some work...

That's all that's needed to be able to store and retrieve objects in the ThreadLocal. ThreadLocal is used in many frameworks and is a popular choice for use cases of sharing cross cutting objects during the execution of a thread. 

Is there any danger involved?
There are potential dangers that you could face by using ThreadLocals though. However these dangers are not because of merely using the ThreadLocal, but because its not used properly (as with many other things as well). The issues are apparent when we work on a managed environment, such as a application server that maintains a thread pool dishing out threads for incoming requests. This means one request would get hold of a thread, finish the work involved and then the application server reclaims the thread back to the pool. Now if you stuck an object in the ThreadLocal, it doesn't get removed when the thread returns to the pool unless this is cleaned out. So you've now mixed the state of a thread that finished into another executing thread. 
So as part of using this properly, the thread local needs to be cleared out before the execution ends. This is ideally done in a finally block where the ThreadLocal is nullified. 
The above addition to the ThreadLocalHolder exposes a clear method that cleans out the ThreadLocal for this thread.

public class ThreadLocalHolder {

    private static ThreadLocal store = new ThreadLocal();

    public static void set(FooContext context) {

    public static FooContext get() {

    public static void clear() {
And this is cleared as follows;

public class FooBusinessObject {

    private BarDao barDao;

    public void doFoo() {
        try {
            // Some logic that builds a FooContext
            FooContext context = new FooContext();

            // sticks the context into the ThreadLocal.

            // Do more work....
        finally {
            // clean up

ThreadLocal usage is a very useful technique to provide per thread contextual information. Used properly it's a valuable method that will come handy where applicable.