MANIFEST.MF must be the first resource in a jar file – here’s how to fix broken jars

November 15, 2011

Some tools, like the Sling OSGi Installer, require the MANIFEST.MF to be the first file in a jar file, or they won’t find it.

This happens when using the java.util.jar.JarInputStream class to read a jar’s manifest, for example.

The manifest is where OSGi bundle headers are found, for example, so not having it in the right place makes the jar unusuable as a bundle.

I won’t discuss here whether this requirement is part of the jar file spec (it would make sense, as this makes sure you can read it quickly even from a huge jar file), but anyway there are many cases where this is required.

To fix a jar where this is not the case, you need to unpack the jar and recreate it, as in this example, starting from a broken.jar in the current directory:

$ mkdir foo
$ cd foo
$ jar xvf ../broken.jar
$ mv META-INF/MANIFEST.MF /tmp/mymanifest
$ jar cvfm fixed.jar /tmp/mymanifest .

That’s it – creating the new jar with the jar utility puts the MANIFEST.MF in the right place.


Quick notes from Mark O’Neill’s Transfer Summit 2011 talk

September 8, 2011

Here’s my quick on-the-spot notes of Mark O’Neill’s (@marxculture) excellent presentation, on how a government’s IT can innovate.

Citing Andrew Savory, never thought I’d see govt. spokesperson be entertaining and informative. Reminds me of my excellent experiences working for the Swiss Parliament services in the late nineties.

Here’s my unedited notes – slides should be available soon. URLs and emphasis added by myself.

UK gov spends 20 billion pounds a year on IT, with 20 top suppliers.

Mark’s got a 20MB mailbox to manage this.

Different velocities: technology, business society.

His role: track and align to those velocities.

Velocity of change in his government IT world is “very different” from what it is on the outside. The gap is increasing.

The more you diverge from the velocity of the market, the higher your costs are.

The options are: do the same, do nothing or ask a different question.

The challenge: no money – although they’re spending 20 billion pounds a year…

The government is currently driven by the policy cycle: problem -> draw up a policy -> interpret and implement policy -> monitor and evaluate. The customer is not part of this picture. Need an innovation model.

He shows a picture of a washing machine for dogs: there is an e-petition about this, the e-petition site (http://epetitions.direct.gov.uk/) got 16k such petitions in 4 weeks, 3.8 million visitors, 20 million page views, 1.5 million signatures. The e-petition site was built in 6 weeks with 6 people, cost 80 thousand pounds including security costs. The system will be released as open source shortly.

The future: small projects are developed by his office, up to 3 months with contracted companies, if more than 3 months need to ask different questions.
20 billion pounds a year should be one of the most dynamic, diverse and entrepreneurial market in the world. What needs to be done to achieve that?

Leaner procurement mechanisms. Use open standards and open source. Look at the processes, check how much paperwork and overhead is actually necessary.
Challenge to new projects includes two questions: what is it that you want to do, and why do you want to do it that way.

Decompose the project in smaller units of work – also gives more chance for SMEs to jump in.

Three layers: infrastructure, app, support. Procurement is currently often based on a complete silo – moving to smaller units can help reuse and sharing.

Rethink the approach to make it possible to buy an excellent application form an SME that does not provide infrastructure or services.

Working on G-Cloud, private onshore cloud for UK government services. (http://www.cloudbook.net/directories/gov-clouds/uk-government-cio-council).

Talking of “agile” “cloud” is a bit like talking about “magic” these days ;-)

Conversation always trumps process.

Two key differences between agile and waterfall.

First difference: with waterfall you talk to developers, they go away for six months, come back with something that you don’t want. Agile: you talk all the time.

Second difference: with waterfall, you might not be around anymore when problems arise. With agile, you could be fired after three months…

Tools are hard. Discuss, share, build, learn. Cannot deliver success without using efficient tools, take example from open development teams in the outside world.

Need to build mechanisms to learn, share and discuss what the team is doing.

Success factors:

1) Be the most dynamic, diverse and entrepreneurial market in the world.

2) IT should just work.

3) Reuse, dialogue, agility, ownership should be part of the day-to-day business.


Turning 42, and why I love my job

August 24, 2011

I’ll be turning 42 in a few months (counting in base 12 of course) and it feels like a good time to reflect on what it is that makes my job enjoyable. My father was a carpenter, and both my brothers started with that as their first job, so I’m kind of the disruptive element of the family (I didn’t say black sheep, ok?).

So, why did I choose to work with cold electronics (my first degree) and computers instead of working with a noble and beautiful living thing like wood?

After some thinking I came up with four key elements.

The first key element is creating cool things. Note that I don’t say creating cool software: I realized that for me the creative process is more important than what exactly is being created. Coolness is obviously a subjective measurement, so it’s hard to define precisely. Lean and maintainable software that people find useful definitely falls into that category for me.

Next is working with bright and fun people. Being active in the Apache Software Foundation, and joining Day in 2007 made me realize how stimulating it is to work with people that impress you every day with their technical and other skills. People who are fun to work with help keep some distance with the Big Problems At Work. Technical and other problems are bound to happen in any job, and that’s when your colleagues’ attitudes make all the difference. Software and work are not always the most important things in life.

Using efficient and fun tools comes next – in my previous life as an independent software developer and architect I sometimes had to put up with lame environments and tools at customer sites, and that can be depressing when you’re aiming for quality and efficiency. My first grade math teacher kept saying that good craftsmen use good tools, and she was right!

The fourth element is keeping a good work-life balance. I tend to engage myself 100% in my work, but for that to happen I need to be able to engage myself 100% in other things at regular intervals. This often means disconnected holidays “away from the grid”. I also decided long ago to never work on Sundays, unless there’s really no other way, which is rare. This has helped me keep my sanity during those phases when the rest of the week is totally crazy.

The fun thing is that those four elements would totally apply to being a carpenter…and I actually did enjoy helping at my father’s shop during school holidays when I was a kid. I’m not planning on going back though – now that my son learned carpentry as well, he’s making fun of me every time I try!


How to fix your project collaboration model?

August 5, 2011

I’ve been studying the collaboration processes of the Apache Software Foundation (ASF) for a while [1], by observing and practicing the successful collaboration model that we use in ASF projects.

Taking the opposite angle, I’ve been reflecting lately on how to fix a broken collaboration model. How do you help teams move from an “I have no idea what my colleagues are doing, and I get way too much email” model to the efficient self-service information flow that we see in ASF and other open source projects?

As I see it, the success of the Apache collaboration model is based on six guiding principles:

  • If it didn’t happen on the dev list, it didn’t happen.
  • Code speaks louder than words.
  • Whatever you’re working on, it must be backed by an issue in the tracker.
  • If your file is not in subversion it doesn’t exist.
  • Every important piece of information has a permanent URL.
  • Email is where information goes to die.

Some of those might need to be slightly tweaked for your particular context, but I believe you can apply most of them to all collaborative projects, as opposed to just software development.

The context of an ASF project is a loose group of software developers, architects, testers, release managers and users (with many people playing several of these roles) working remotely, with no central decision authority. There’s no manager in an ASF project, no money exchanged and no common corporate culture. Decisions are made by consensus, by a group that grows organically as new members are elected based on their merit with respect to that particular project.

This may sound like a recipe for failure, yet many of our projects are very successful in delivering great software consistently. How does that work?

Let’s describe our six principles in more detail, so that you can see if they apply to your own project.

If it didn’t happen on the dev list, it didn’t happen.

In an ASF project, all decisions are made on a single developers mailing list.

Backchannel discussions are inevitable: someone will meet a coworker at the coffee machine, people attend conferences, talk on IRC, etc. There’s nothing wrong with that, but the rule is that as soon as a discussion turns into something that has impact on the project, it must come back to the dev list. The goal is for all project participants to have the same information from a single source.

As those lists are archived, you can then go back to check why and how a particular decision was made.

Creating consensus using email only can be hard, on the other hand it forces you to clarify your ideas and express them in an understandable way, which in the case of software often promotes cleaner architectures and designs (but see also code speaks louder than words below).

Email etiquette (solid and evolving subject lines, concise writing, precise quoting etc.) plays a big role in making this work, and that’s something people have to learn.

In an ASF project, experienced members will often help beginners improve their list communications skills, by gently steering them towards more efficient writing and messaging styles. And your message might just be ignored if the subject line says “URGENT: need help!” or “Question”, which can be a good beginner’s lesson.

Top-posting is usually frowned upon on ASF lists, as that often leads to superficial discussions compared to the much more precise inline quoting.

Email is where information goes to die.

Aren’t we contradicting the previous principle here?

Not really – the dev list is for the flow of discussions and decisions, not for important information that people need to refer to in their work.

Even with good and multiple email archives like the ones we have for ASF projects, finding a specific piece of information in email is painful. That might be good enough for going back to a discussion to get more context about what happened, and to go back to decisions (marked by [VOTE] in their subject lines at the ASF) from time to time, but definitely not for the day-to-day information that you need in such a project. That’s where the issue tracker, discussed below, comes into play.

Code speaks louder than words.

No one told me that when I started working in ASF projects, but after some time trying to argue about software architecture and how my ideas would improve our project’s software, I realized that I was just wasting my time.

The best way to express a software architecture or an idea that will improve a software component is to implement it and show the code to your fellow project members.

That code might be just a rough and ugly prototype in any suitable programming language, that’s not a problem – as long as it expresses your ideas, you will often spend much less time getting that idea across when it’s backed by a concrete piece of code.

Whatever you’re working on, it must be backed by an issue in the tracker.

This might be the most important of our guiding principles – on a desert island, I think I’d be ready to work with just an issue tracker as my collaboration channel.

Many people think that issue trackers are only for bugs: you create an issue when you have run your software and found something that doesn’t work.

Although that was the original intention, I believe (and I’ve seen this fly in many different contexts) that an issue tracker can be used to back all the work that’s being done in a software or other project. Software design, implementation, test servers and continuous integration setups and maintenance, hardware upgrades, password reset requests, milestones like demos and sprints (as issues, not just dates)…and bugs of course: managing all project tasks in a single tracker makes a huge difference.

When used as your coordination backbone, a good issue tracker (we currently use JIRA and Bugzilla at the ASF) will help answer a lot of questions about your project’s status, such as

  • Who’s working on what?
  • What did Joe Developer do last month?
  • Where do we stand?
  • What prevents X from being implemented?
  • Why did we implement X in this way back in 2001?

For this to work, project members need to update their issues often, and break them down into smaller issues as needed, so that the tracker later tells the story of how the task went from A to B.

No need for big literary statements, but instead of sending an email to the dev list saying that X is ready to be used, just update the corresponding issue in the tracker. A good tracker will send events when that happens, to which people can subscribe. Together with the source code control system events (commits etc.) this creates a live activity stream for your project. Making extensive use of the tracker will help provide a complete picture of what’s happening, in that stream.

I’m also a big fan of using issue dependencies to organize issues in trees that can be used to keep track of what needs to be done to reach the project’s goals (aka “planning”, but in a more organic form).

Cygwin dependency treeAs an example, here’s the dependency tree for bug 3383 of the cygwin project. That’s not an ASF project, which shows that we’re not the only ones to use this technique.

That tree starts with “the FRYSK project” as an umbrella issue, which is broken down into issues that represent the different project areas, and so on util it reaches the actual things that need to be implemented or fixed. Combined with a tracker’s reporting functions, such a tree helps tremendously in answering the “where do we stand?” question, and in reshuffling priorities quickly in a crisis. You can also create umbrella issues for sprints or releases, to express what needs to be done to get there.

If your file is not in subversion it doesn’t exist.

The goal here is to avoid relevant files floating around – put everything in subversion, or in whatever source code control system you’re using.

If some files are really too large for that, make sure they have permanent URLs, and document those in the issue tracker or in your source code.

Every important piece of information has a permanent URL.

If you cannot point to a piece of information with a URL, it’s probably not worth much. Or you’re using the wrong system to store that information.

I also like to speak in URLs as much as possible. Saying SLING-505 (which is an agreed upon shortcut for https://issues.apache.org/jira/browse/SLING-550 in an ASF project) is so much more precise than referring to the background servlets feature.

Conclusion

Reviewing your collaboration model against the above principles should help find out where it can be improved. The most common problems that I’ve seen in the various organizations that I worked for in my career are the “split brain” problem, where testers, customers or managers use different tracking systems, and the “way too much email” problem where people send important information (or code…aaaaarghhhh) around in email as opposed to taking advantage of a tracker.

Does that sound like you? If yes you might want to have a closer look at how successful open projects work, and take inspiration from them.

[1] See previous posts: What makes Apache tick? and Open Source collaboration tools are good for you!.


Becoming an Apache project is a process, not just a decision

June 1, 2011

Zdnet.com is talking about the ASF accepting or rejecting a new project – let me mention that this is not exactly how things work.

The only way to create new projects at Apache is through the Incubator, so if project FOO wants to join the Apache Software Foundation, that can only take the form of a proposal for incubation (see examples at http://wiki.apache.org/incubator/).

That proposal is then discussed and sometimes questioned by the Incubator PMC (Project Management Committee), in the open on the general@incubator.apache.org mailing list, until consensus is reached about the proposal and initial committers and mentors team.

A so-called podling is then created to manage the incubating project, and it’s only after successful graduation (based mostly on community aspects and “legal cleanliness” of the code and podling releases) that a project becomes a “real” Apache project.

So, becoming an Apache project is not just a decision of the Apache Incubator, it’s a process that can take from a few months to much longer, depending on the code base and on the community that forms around it.

Apache Subversion is an example of a project that zoomed through the Incubator, mostly because it was already operating like an Apache project, and its code already fulfilled Apache’s legal requirements. There are also many examples of podlings which stay way too long in the Incubator, either because their code or their community isn’t developing as it should.

ZdNet is speculating about OpenOffice in this case, so you’d think that such an important project would get a different treatment? I don’t think so – the Apache process and governance rules have been proven over time on a few earth-shaking projects already (like the HTTP Server, Apache Lucene, Apache Solr and Apache Hadoop to name a few), so I don’t think any project would get a special treatment – our current process works just fine.


+1! ™ – a rosy financial future for the Apache Software Foundation

April 1, 2011

Google recently announced their +1 button, which will without a doubt make the Internet a better place. What’s not to like +1?

As everybody knows the +1 concept has been invented at the Apache Software Foundation (ASF) – and it seems like there’s event an ASF patent pending on it. (Update: see also here, via @jaaronfarr). Our voting process makes extensive use of this simple and very effective concept.

If you do your math the bandwidth (and thus power, greenhouse gas, etc.) saved by using +1 instead of I agree in all our emails does make a difference for the planet – it’s not just a fun gimmick.

In recognition of this invention, usually well informed sources tell us, Google is going to donate 3.141592654 cents (yeah that’s Pi – they’re Google, you know) to the ASF every time someone uses their +1 button, starting today!

That’s excellent news for the ASF – as with any volunteer organizations, more funds mean more action, more power and more fun! I haven’t yet been able to estimate how much money those N*Pi +1 clicks represent in a year, but that’s certainly in the pile of money range.

A small downside is that we’ll need to use +1(tm), with the trademark sign, from now on. That’s a small price to pay for what looks like a rosy financial future for the ASF.

Very impressive move. thanks Google! The Open Source world should mark today’s very special date with a white stone, as we say in French.

+1(tm)!


glow.mozilla.org: smoke and mirrors, and RESTful design

March 22, 2011

Glow shotWhen I was a kid, my aunt gave me a book called the art of engineering. The title sounded weird to me at first – isn’t engineering the opposite of art?

It’s not – artful design can be visible in the best pieces of software, and not only at the user interface level. I find the realtime display of Firefox 4 downloads by glow.mozilla.org fascinating, and being my curious self I wondered how the data is transferred.

Starting with the requirement of broadcasting real-time data to millions of clients simultaneously, many of us would end up with expensive message queuing systems, RPC, WebSockets, SOAP^H^H^H^H (not SOAP – don’t make me cry). Lots of fun ways to add some powers of ten to your budget.

Don’t believe anyone who tells you that software has to be complicated, or that engineering cannot be artful. Simplicity always wins, and glow.mozilla.org is an excellent example of that.

The first thing that I noticed when looking at how glow gets its data (which was very easy thanks to the use of sane http/json requests) is that glow is not real-time.

I’d call it smoke and mirrors real-time: the client just requests a new batch of data points every minute, and the server can change this interval at any time, which can be very handy if traffic increases. Fetching slightly old data every minute is more than enough for a human user who doesn’t care if the data is a bit outdated, and it makes the system a bit simpler.

The first of these two regular data requests is to an URL like http://glow.mozilla.org/data/json/2011/03/21/14/42/count.json. The path already tells you a lot about what this is, which although not required is often a sign of a good RESTful design.

The response contains an array of data points (number of downloads per minute), along with two very important items that control the data transfer:

{
   "interval":60,
   "data":[
      [
         [
            2011,3,21,13,43
         ],
         1349755
      ],
      [
         [
            2011,3,21,13,44
         ],
         1350332
      ],
      ...
   ],
   "next":"2011/03/21/14/43/count.json"
}

The interval tells the client when to ask for data next, and the next item is the path to the next batch of data. At least that’s what I assume, I haven’t checked the client code in detail but that seems obvious.

Using URLs and data that seem obvious is the essence of the Web, and of a good RESTful design. Using RPC, WebSockets or any other supposedly more sophisticated mechanism would bring nothing to the user, and would only make things more complicated. Being able to throttle data requests from the server-side using the interval and next items is very flexible, obvious, and does not require any complicated logic on the client side.

The second data URL looks like http://glow.mozilla.org/data/json/2011/03/21/14/42/map.json, and if my quick analysis is correct it returns geographic coordinates of the dots that represent geolocated downloads. It uses the same interval/next mechanism for throttling requests.

All in all, an excellent example of engineering smoke and mirrors applied in the right way, and of simple and clean RESTful design. No need for “sophisticated” tools when the use case doesn’t really require them. Kudos to whoever designed this!

Update: The Mozilla team has more details on their blog. Thanks to Alex Parvulescu for pointing that out.