Chapter 1: Introduction

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 1

Introduction

WHEN COREL BOUGHT WordPerfect for almost $200 million from the Novell Corporation in the mid 1990s, nobody would have thought that in a matter of months they would have been giving away the source code free. However, when Corel ported WordPerfect to Java and released it as a beta product, a simple program called Mocha ¹ could quickly and easily reverse engineer, or decompile, significant portions of Corel’s Office for Java back into source code.

Decompilation is the process that transforms machine-readable code into a human readable format. When an executable, a Java class file, or a DLL is decompiled, you don’t quite get the original format; instead you get a type of pseudo source code, often incomplete and almost always without the comments. But often what you get is more than enough to understand the original code.

The purpose of this book is to address an unmet need in the programming community. For some reason, the ability to decompile Java has been largely ignored even though it is relatively easy for anyone with the appropriate mindset to do. In this book, I would like to redress the balance by looking at what tools and tricks of the trade are currently being employed by people who are trying to recover source code and those who are trying to protect it using, for example, obfuscation.

This book is for those who want to learn Java by decompilation, those who simply want to learn how to decompile Java into source code, those who want to protect their code, and finally those who want to better understand Java bytecodes and the Java Virtual Machine (JVM) by building a Java decompiler.

This book takes your understanding of decompilers and obfuscators to the next level by

Exploring Java bytecodes and opcodes in an approachable but detailed manner.
Using examples that show you what to do when an applet only partially decompiles.
Providing you with simple strategies you can use to show users how to protect their code.
Showing you what it takes to build your own decompiler.

Compilers and Decompilers

Computer languages were developed because most normal people cannot work in machine code or its nearest equivalent, Assembler. Thankfully, we realized pretty early in computing technology that humans just weren’t cut out to program in machine code. Computer languages, such as Fortran, COBOL, C, Visual Basic, and more recently, Java and C#, were developed to allow us to put our ideas in a human-friendly format that can then be converted into a format that a computer chip can understand.

At its most basic, the compiler’s job is to translate this textual representation—source code—into a series of 0’s and 1’s—machine code—which the computer can interpret as actions or steps that you want it to perform. It does this using a series of pattern matching rules. A lexical analyzer tokenizes the source code² and any mistakes or words that are not in the compiler’s lexicon are rejected immediately. These tokens are then passed to the language parser , which matches one or more tokens to a series of rules and translates these tokens into intermediate code (some early versions of Visual Basic, Pascal, and Java) or sometimes straight into machine code (C and Fortran). Any source code that doesn’t match a compiler’s rules is rejected and the compilation fails.

So now you know what a compiler does. Well, to be honest, you’ve only scratched the surface; compiler technology has always been a specialized, and sometimes complicated, area of computing. Modern advances mean things are going to get even more complicated, especially in the virtual machine domain. In part, this drive comes from Java and now .NET. Just in Time (JIT) compilers have tried to close the gap between Java and C++ execution times by optimizing the execution of Java bytecodes. This seems like an impossible task because Java bytecode is, after all, interpreted, whereas C++ is compiled. But JIT compiler technology is making significant advances and is also making Java compilers and virtual machines much more complicated beasts by incorporating these advances.

From your point of view, you need to know that most compilers do a lot of preprocessing and post-processing. The preprocessor readies the source code for the lexical analysis by stripping out all unnecessary information, such as the programmer’s comments, and adding in any standard or included header files or packages. A typical post-processor stage is code optimization, where the compiler parses or scans the code, reorders it, and removes any redundancies, which will hopefully increase the efficiency and speed of your code.

Decompilers, no big surprise here, translate the machine code or intermediate code back into source code. In other words, the whole process is reversed. Machine code is tokenized in some way and parsed or translated back into source code. This transformation rarely results in original source code because some information is lost in the pre- and post-processing stages.

Take the analogy of idioms in human languages, which are often the most difficult part of a sentence or phrase to translate. My favorite idiom is L’esprit d’escalier, which literally translates as the wit of the staircase. But what it really means is that perfect witty comment or comeback that pops into your head half an hour too late. Similarly (and I know I’m stretching it a bit here) source code can often be translated into machine code in more than one way. Java source code is designed for humans and not computers, and often some steps may be redundant or can be performed more quickly in a slightly different order. Because of these lost elements, few (if any) decompilations result in the original source.

Virtual Machine Decompilers

Several notable attempts have been made to decompile machine code; Christina Cifuentes’ dcc is one of the most recent.³ However, at the machine code level, the data and instructions are commingled, and it is a much more difficult, but not impossible, to recover the original code.

In a virtual machine, the code has simply passed through a preprocessor and the decompiler’s job becomes one of simply reversing the preprocessing stages of compilation. This makes interpreted code much, much easier to decompile. Sure, there are no comments, and worse still, no specification, but then again, there are also no research and development (R&D) costs.

Why Java?

The original JVM was designed to run on a TV cable set-top box. As such, it was a very small stack machine that pushed and popped its instructions on and off a stack using only a limited instruction set. This made the instructions very easy to understand with relatively little practice. Because the compilation process was a two-stage process, the JVM also required the compiler to pass on a lot of information, such as variable and method names, that would not otherwise be available. These names could be almost as helpful as comments when you were trying to understand decompiled source code.

The current design of the JVM is independent of the Java 2 Software Development Kit (SDK). In other words, the language and libraries may change, but the JVM and the opcodes are fixed. This means that if Java is prone to decompilation now, then it is always likely to be prone to decompilation. In many cases, as you shall see, decompiling a Java class is as easy as running a simple DOS or Unix command.

In the future, the JVM may very well be changed to stop decompilation, but this would break any backward compatibility and all current Java code would have to be recompiled. And although this has happened before in the Microsoft world with different versions of Visual Basic, a lot more companies than Sun develop virtual machines.

JVMs are now available for almost every operating system and web browser. In fact, Java applets and applications can run on any computer or chip from a mainframe right down to a handheld or a smartcard as long as a JVM and appropriate class libraries exists for that platform. So it’s no longer as simple as changing one JVM.

What makes this situation even more interesting is that companies that want to Java enable their operating system or browser usually create their own JVMs. Sun is now only really responsible for the JVM specification. It seems that things have now progressed so far that any fundamental changes to the JVM specification would have to be backward compatible. Modifying the JVM to prevent decompilation would require significant surgery, and in all probability, it would break this backward compatibility, thus ensuring that Java classes will decompile for the foreseeable future.

It’s true that no such compatibility restrictions exist on the Java SDK, where more and more functionality is added almost daily. And the first crop of decompilers did dramatically fail when inner classes were first introduced in the Java Development Kit (JDK) 1.1. However, this isn’t really a surprise because Mocha was already a year out of date when 1.1 was released and other decompilers were quickly modified to recognize inner classes.

TOP TEN REASONS WHY JAVA IS MORE VULNERABLE TO DECOMPILATION

For portability, Java code is partially compiled and then interpreted by the JVM.
Java’s compiled classes contain a lot of symbolic information for the JVM.
Because of backward compatibility issues, the JVM’s design is not likely to change.
The JVM has very few instructions or opcodes.
The JVM is a simple stack machine.
Standard applets and applications have no real protection against decompilation.
Java applets are typically small and therefore intelligible without comments.
Larger Java applications are automatically compiled into smaller modular classes.
Java applets are typically downloaded for free.
Java hype and cutthroat competition equal plenty of applications and plenty of people willing to decompile them.

So unlike other Java books, I don’t expect that this book will go out of date with the next release of the JDK. Sure, some extra features may be added, but the underlying architecture will remain the same. Let’s begin with a simple example in Listing 1-1.

Listing 1-1. Simple Java Source Code Example

public class Casting {
   public static void main(String args[]){
        for(char c=0; c < 128; c++) {
                System.out.println(“ascii “ + (int)c + “ character “+ c);
        }
   }
}

Listing 1-2 shows the output for a simple class file whose source is shown in Listing 1-1 using javap, Sun’s class file disassembler that came with the original versions of Sun’s JDK. You can decompile Java so easily because, as you’ll see later in the book, the JVM is a simple stack machine with no registers and a limited number of high-level instructions or opcodes.

Listing 1-2. Javap Output

Compiled from Casting.java
public synchronized class Casting extends java.lang.Object
    /* ACC_SUPER bit set */
{
    public static void main(java.lang.String[]);
/* Stack=4, Locals=2, Args_size=1 */
    public Casting();
/* Stack=1, Locals=1, Args_size=1 */
}
 
Method void main(java.lang.String[])
   0 iconst_0
   1 istore_1
   2 goto 41
   5 getstatic #12 <Field java.io.PrintStream out>
   8 new #6 <Class java.lang.StringBuffer>
  11 dup
  12 ldc #2 <String “ascii “>
  14 invokespecial #9 <Method java.lang.StringBuffer(java.lang.String)>
  17 iload_1
  18 invokevirtual #10 <Method java.lang.StringBuffer append(char)>
  21 ldc #1 <String “ character “>
  23 invokevirtual #11 <Method java.lang.StringBuffer append(java.lang.String)>
  26 iload_1
  27 invokevirtual #10 <Method java.lang.StringBuffer append(char)>
  30 invokevirtual #14 <Method java.lang.String toString()>
  33 invokevirtual #13 <Method void println(java.lang.String)>
  36 iload_1
  37 iconst_1
  38 iadd
  39 i2c
  40 istore_1
  41 iload_1
  42 sipush 128
  45 if_icmplt 5
  48 return
 
Method Casting()
   0 aload_0
   1 invokespecial #8 <Method java.lang.Object()>
   4 return<

It should be obvious that a lot of the source code information exists in a class file; my aim is to show you how to take this information and reverse engineer it into source code. However, in many cases, Java classes won’t decompile without some extra effort; you’ll need to understand the underlying design and architecture of a Java classfile and the JVM itself, which is what I’m going to provide you with in the remainder of this book.

History: Basic Chronology

Since before the dawn of the humble PC . . . . Scratch that. Since before the dawn of COBOL, decompilers have been around in one form or another. In fact, you have to go all the way back to ALGOL to find the earliest example of a decompiler. Donnelly and Englander wrote D-Neliac at the Naval Electronic Labs (NEL) in 1960. Its primary function was to convert non -Neliac compiled programs into Neliac compatible binaries. Neliac was an ALGOL-type language that stood for the Navy Electronics Laboratory International ALGOL Compiler.

Over the years, there have been other decompilers for COBOL, Ada, Fortran, and many other esoteric as well as mainstream languages running on IBM mainframes, PDP/11s, and Univacs, among others. Probably the main reason for these early developments was to translate software or convert binaries to run on different hardware.

More recently, reverse engineering and the Y2K problem have become the acceptable face of decompilation. Converting legacy code to get around the Y2K problem often required disassembly or full decompilation. Reverse engineering is a huge growth area that has not disappeared since the turn of the millennium. Problems caused by the Dow Jones hitting the 10-thousand mark—ah, such fond memories—and the introduction of the Euro have all caused financial programs to fall over.

Even without these developments reverse engineering techniques are being used to analyze old code, which typically has thousands of incremental changes, in order to remove any redundancies and convert these legacy systems into much more efficient animals.

At a much more basic level, hexadecimal dumps of PC machine code have always given programmers extra insight into how something is achieved or into how to break any artificial restrictions placed on the software. Magazine CDs were either time-bombed or had restricted copies of games; these could be patched to change demonstration copies into full versions of the software using primitive disassemblers such as the DOS debug command.

Anyone well versed in Assembler can learn to quickly spot patterns in code and bypass the appropriate source code fragments. Pirate software is a huge problem for the software industry; disassembling the code is just one technique employed by the professional or amateur bootlegger. Hence the downfall of many an arcane copy protection technique.

However, the DOS debug command and Hexidecimal editors are primitive tools and it would probably be quicker to write the code from scratch than to try to re-create the source code from Assembler. For many years now, traditional software companies have also been involved in reverse engineering software. They have studied new techniques, and their competition has copied these techniques all over the world using reverse engineering and decompilation tools. Generally, this is accomplished using in-house decompilers, which are not for public consumption and are definitely not going to be sold over the counter.

It’s likely that the first real Java decompiler was actually written in IBM and not by Hanpeter Van Vliet, author of Mocha. Daniel Ford’s whitepaper Jive: A Java Decompiler, dated May 1996, appears in IBM Research’s search engines. This whitepaper just beat Mocha, which wasn’t announced until July 1996.

Academic decompilers such as the University of Queensland’s dcc are available in the public domain. Fortunately for the likes of Microsoft, decompiling Office using dcc would create so much code that it would be about as user friendly as Debug or a hexadecimal dump. Most modern commercial software’s source code is so large that it becomes unintelligible without the design documents and lots of source code comments. Let’s face it; many people’s C++ code is hard enough to read six months after they wrote it. So how easy would it be for someone else to decipher C code that came from compiled C++ code without any help, even if the library calls aren’t traversed?

What does come as a big surprise is the number of decompilers that are currently available but aren’t that well publicized. Decompilers or disassemblers are available for Clipper (Valkyrie), FoxPro (ReFox), Pascal, C (dcc and decomp), Ada, and, of course, Java. Even the Newton, loved by Doonesbury aficionados everywhere, isn’t safe.

Not surprisingly, decompilers are much more common for interpreted languages, such as Visual Basic, Pascal, or Java, because of the larger amounts of information being passed around. Some even have built-in dynamic compilers that regenerate source code on the fly, which is then subsequently recompiled into machine code, depending on the initial decompilation.

Visual Basic Decompilers

Let’s take a look at Visual Basic (VB), another interpreted language, as an example of what can happen to interpreted languages. Early versions of VB were interpreted by the vbrun.dll in a somewhat similar fashion to Java and the JVM; and just like a Java classfile, the source code for VB programs is also bundled within the binary. Bizarrely, Visual Basic 3 retains even more information than Java; this time even the programmer’s comments are included.

The original versions of VB generated an intermediate pseudocode , called p-code, which was also in Pascal and originates in the P-System.⁴ And before you say anything, yes, Pascal and all its derivatives are just as vulnerable; this statement also includes early versions of Microsoft’s C compiler, just so that nobody else feels left out. The p-codes are not dissimilar to bytecodes and are essentially VB opcodes that are interpreted by vbrun.dll at run time. Ever wonder why you need to include vbrun.dll with VB executables? Well now you know—you need to include vbrun.dll so that it can interpret the p-code and execute your program.

Doctor (Hans-Peter) Diettrich from Germany is the author of the eponymously titled DoDi—perhaps the most famous Visual Basic decompiler. These days DoDi—also known as Vbis3—is outdated because it only decompiles VB3 binaries, although there were rumors of a version for VB4. But because VB moved to compiled rather than interpreted code, the number of decompilers completely fell away.

At one time, Visual Basic also had its own culture of decompilers and obfuscators, or protection tools as they’re called in VB. Doctor Diettrich provides VBGuard for free on his site, and other programs, such as Decompiler Defeater, Protect, Overwrite, Shield, and VBShield, are available from other sources. But they too have all but disappeared with VB5 and VB6.

This was, of course, before .NET. With the arrival of .NET, we’ve once again come full circle and VB is once again interpreted. Not surprisingly, we’re already seeing decompilers and obfuscators such as the Exemplar and Anakrino decompilers as well as Demeanor and Dotfuscator.

Hanpeter Van Vliet

Oddly enough for a technical subject, this book also has a very human element. Hanpeter Van Vliet wrote the first public domain decompiler, Mocha, while recovering from a cancer operation in the Netherlands. He also wrote an obfuscator called Crema that attempted to protect an applet’s source code. If Mocha was the Uzi machine gun, then Crema was the bulletproof jacket. In a now classic Internet marketing strategy, Mocha was free, whereas there was a small charge for Crema.

The beta version of Mocha caused a huge controversy when it was first made available on Hanpeter’s web site, especially after it was featured in a c|net article. Because of the controversy, Hanpeter took the very honorable step of removing Mocha from his web site. He then held a vote about whether or not Mocha should once again be made available. The vote was ten to one in favor of Mocha, and soon after it reappeared on Hanpeter’s web site.

However, Mocha never made it out of beta, and while I was conducting some research for a Web Techniques article on this very subject, I learned from his wife that Hanpeter’s throat cancer finally got him. He died at the age of 34 on New Year’s Eve, 1996.

The source code for both Crema and Mocha were sold to Borland shortly before Hanpeter’s death, with all proceeds going to Hanpeter’s wife, Ingrid. Some early versions of JBuilder shipped with an obfuscator, which was probably Crema. This attempted to protect Java code from decompilation by replacing ASCII variable names with control characters.

I’ll talk more about the host of other Java decompilers and obfuscators later in the book.

Legal Issues

Before you start building your own decompiler, why don’t you take this opportunity to consider the legal implications of decompiling someone else’s code for your own enjoyment or benefit? Just because Java has taken decompiling technology out of some very serious propeller head territory and into more mainstream computing doesn’t make it any less likely that you or your company will get sued. It may make it more fun, but you really should be careful.

To start with, why don’t you try following this small set of ground rules:

Do not decompile an applet, recompile it, and then pass it off as your own.
Don’t even think of trying to sell a recompiled applet to any third parties.
Try not to decompile an applet or application that comes with a license agreement that expressly forbids decompiling or reverse engineering the code.
Don’t decompile an applet to remove any protection mechanisms and then recompile it for your own personal use.

Over the past few years, big business has tilted the law firmly in its favor when it comes to decompiling software. Companies can use a number of legal mechanisms to stop you from decompiling their software; these would leave you with little or no legal defense if you ever had to appear in a court of law if someone discovered that you had decompiled their programs. Patent law, copyright law, anti–reverse engineering clauses in shrinkwrap licenses, as well as a number of laws such as the Digital Millennium Copyright Act (DMCA) may all be used against you. Different laws may apply in different countries or states; for example, the “no reverse engineering clause” software license is a null and void clause in the European Union (EU), but the basic concepts are the same—decompile a program for the purpose of cloning the code into another competitive product and you’re probably breaking the law.

The secret here is that you shouldn’t be standing, kneeling, or pressing down very hard on the legitimate rights—that is, the copyright rights—of the original author. That’s not to say that conditions exist in which it is OK to decompile. However, certain limited conditions do exist where the law actually favors decompilation or reverse engineering through a concept known as fair use. From almost the dawning of time, and certainly from the beginning of the industrial age, many of humankind’s greatest inventions have come from an individual who has created something special while standing on the shoulders of giants. For example, both the invention of the steam train and the common light bulb were relatively modest incremental steps in technology. The underlying concepts were provided by other people, and it was up to Stephenson or Edison to create the final object. You can see an excellent example of the Stephenson’s debt to many other inventors such as James Watt in the following timeline of the invention of the Stephenson’s Rocket at http://www.usgennet.org/usa/topic/steam/timeline.html. This concept of standing on the shoulders of giants is one of the reasons why patents first appeared—to allow people to build on other creations while still giving the original inventor some compensation for their initial idea for period of, say, 20 years.

In the software arena, trade secrets are typically protected by copyright law rather than through any patents. Sure, patents can protect certain elements of a program, but it is highly unlikely that a complete program will be protected by a patent or a series of patents. Software companies want to protect their investment, so they typically turn to copyright law or software licenses to prevent people from essentially stealing their research and development efforts.

Copyright law is not rock solid; if it was, there would be no inducement to patent an idea and the patent office would quickly go out of business. Copyright protection does not extend to interfaces of computer programs, and a developer can use the fair use defense if they can prove that they decompiled the program to see how they could interoperate with any unpublished application programming interfaces (APIs) in the program.

If you are living in the EU, then more than likely you work under the EU Directive on Legal Protection of Computer Programs. This states that you can decompile programs under certain restrictive circumstances—for example, when you are trying to understand the functional requirements you need to create a compatible interface to your own program. Or, to put it another way, if you need access to the internal calls of a third party program and the authors refuse to divulge the APIs at any price. Then, under the EU directive, you could decompile the code to discover the APIs. However, you’d have to make sure that you were only going to use this information to create an interface to your own program rather than create a competitive product. You also cannot reverse engineer any areas that have been protected in any way.

For many years Microsoft’s applications have allegedly gained unfair advantage from underlying unpublished APIs calls to Windows 3.1 and Windows 95 that are orders of magnitude quicker than the published APIs. The Electronic Frontier Foundation (EFF) has come up with a useful road map analogy to help explain this. Say you are trying to travel from Detroit to New York, but your map doesn’t show any interstate routes. Sure, you’d eventually get there traveling on the back roads, but the trip would be a lot shorter if you had the Microsoft map, complete with interstates. If these conditions were true, the EU directive would be grounds for disassembling Windows 2000 or Microsoft Office (MSOffice), but you better hire a good lawyer before you try it. Personally, I don’t buy it as I can’t believe MSOffice could possibly be any slower than it currently is, so if there are any hidden APIs, they certainly don’t seem to be causing any impact on the speed of any of the MSOffice applications.

There are precedents that allow legal decompilation in the US too. The most famous case to date is Sega v. Accolade.⁵ In 1992, Accolade won a case against Sega that ruled that their unauthorized disassembly of the Sega object code was not copyright infringement. Accolade reverse engineered Sega’s binaries into an intermediate code that allowed them to extract a software key. This key allowed Accolade’s games to interact with Sega Genesis video consoles. Obviously Sega was not going to give Accolade access to APIs, or in this case, code, to unlock the Sega game platform. The court ruled in favor of Accolade judging that the reverse engineering constituted fair-use. But before you think this gives you carte blanche to decompile code, you might like to know that Atari v. Nintendo⁶ went against Atari under very similar circumstances.

In conclusion—see you can tell this is the legal section—the court cases in the US and the EU directive stress that under certain circumstances reverse engineering can be used to understand the interoperability and create a program interface. It cannot be used to create a copy to sell as a competitive product. Most Java decompilation will not fall into this interoperability category. It is far more likely that the decompiler wants to pirate the code, or at best, understand the underlying ideas and techniques behind the software.

It is not very clear if reverse engineering to discover how an applet was written would constitute fair use. The US Copyright Act of 1976’s exclusion of “any idea, procedure, process, system, method of operation, concept, principle or discovery, regardless of the form in which it is described” makes it sound like the beginning of a defense for decompilation, and fear of the fair use clause is one of the reasons why more and more software patents are being issued. Decompilation to pirate or illegally sell the software cannot be defended.

However, from a developer’s point of view, the situation looks bleak. The only protection—in the form of a user’s license—is about as useful as the laws against copying music CDs or audiocassettes. It won’t physically stop anyone from making illegal copies and it doesn’t act as any real deterrent for the home user. No legal recourse will protect your code from a hacker, and it sometimes seems that the people trying to create many of today’s secure systems must feel like they are standing on the shoulders of morons. You only have to look at the recent investigation into eBook protection schemes⁷ and the whole DeCSS fiasco⁸ to see how paper-thin a lot of the recent so called secure systems really are.

Moral Issues

Decompiling Java is an excellent way to learning both the Java language and how the JVM works. It helps people climb up the Java learning curve because they learn by seeing other people’s programming techniques. The ability to decompile applets or applications can make the difference between a basic understanding of Java and an in-depth knowledge. Learning by example is one of the most powerful tools. It helps even more if you can pick your own examples and modify them to your own needs.

However, my book on decompiling would not be complete if I didn’t discuss the morality issues behind what amounts to stealing someone else’s code. In the early days of software, it was not uncommon to receive the source code with the product. But in the last few decades, market economics have taken over and this practice has almost disappeared with some notable open source exceptions such as GNU and Linux. But now, due to a certain set of circumstances, we find that Java comes complete with its source code.

The author, the publisher, the author’s agent, and his agent’s mother would like to state that we are not advocating that readers of this book decompile programs for anything other than educational purposes. The purpose of this book is to show readers how to decompile source code, but we are not encouraging anyone to decompile other programmers’ code and then try to use it, sell it, or repackage it as if it was their own. Please be careful that you do not try to reverse engineer any code that has a licensing agreement stating that you should not decompile it. It is not fair, and you’ll only get yourself in trouble. Having said that, there are thousands of applets on the Web, which when decompiled, will help you understand good and bad Java programming techniques.

To a certain extent, I’m pleading the “Don’t shoot the messenger” defense. I’m not the first to spot this flaw in Java, and I certainly won’t be the last person to write about the subject. My reasons for writing this book are, like the early days of the Internet, fundamentally altruistic. Or, in other words, I found this cool trick and I want to tell everyone about it.

Having said this, let me remind you that you can never be sure that the decompiler generated code that was 100 percent accurate. So you’re in for a nasty surprise if you intend to use Java decompilation as the basis for your own products.

Protecting Yourself

Pirated software is a big headache for many software companies and big business for others. At the very least, software pirates could use decompilers to remove any licensing restrictions, but imagine the consequences if the technology was available to decompile Office 2000, recompile it, and sell it as a new competitive product. To a certain extent, that could easily have happened when Corel released the beta version of Corel’s Office for Java.

Perhaps this realization is starting to dawn on Java software houses. We are beginning to see two price scales on Java components: one for the classes and one for the source code. This is entirely speculative, but it seems that companies such as Sitraka (now Quest) realized that a certain percentage of their users would decompile their classes, and as a result, a few years ago Sitraka chose to sell the source code for JClass as well as other components. This makes any decompilation redundant as the code is provided along with the classes and it also makes some money for the developer by charging a little extra for the source code.

But is all doom and gloom? Should you just resign yourselves to the fact that Java code can be decompiled or is there anything you can do to protect your code? Here are some options:

License agreements
Protection schemes within your code
Code fingerprinting
Obfuscation
Intellectual Property Rights (IPR) protection schemes
Executable applications
Server-side code
Encryption

Although you’ll look at all these in more detail later, you should know that the first four only act as deterrents and the last four are effective, but have other implications. Let me explain.

License agreements don’t offer any real protection from a programmer who wants to decompile your code.

Spreading protection schemes throughout your code, such as by using combinations of getCodeBase and getDocumentBase or server authentication, is useless because they can be simply commented out of the decompiled code.

Code fingerprinting is what happens when spurious code is used to watermark or fingerprint source code, and it can be used in conjunction with license agreements, but it is only really useful in a court of law. Better decompilation tools will profile the code and remove any extra dummy code.

Obfuscation replaces the method names and variable names in a class file with weird and wonderful names. This is an excellent deterrent, but the source code is still visible and in conjunction with obfuscated code when the better decompilers are used, so often this is not much better than compiling without the debug flag. HoseMocha, another obfuscator, works by adding a spurious pop bytecode after every return; it does nothing to the code but it does kill the decompiler. However, developers can quickly modify their decompiler once this becomes apparent, assuming they’re still around to make the changes.

IPR protection schemes such as IBM’s Cryptolope Live!, InterTrust’s DigiBox, and Breaker Technologies’ SoftSEAL are normally used to sell HTML documents or audio files on some pay-per-view basis or pay-per-group scheme. However, because they typically have built in trusted HTML viewers, they allow Java applets to be seen but not copied. Unfortunately IPR protection schemes are not cheap. Worse still, some of the clients are written in 100 percent pure Java and can therefore be decompiled.

The safest protection for Java applications is to compile them into executables. This is an option on many Java compilers—SuperCede, for example. Your code will now be as safe as any C or C++ executables—read a lot safer—but it will no longer be portable because it no longer uses the JVM.

The safest protection for applets is to hide all the interesting code on the web server and only use the applet as a thin, front-end graphical user interface (GUI). This has a downside; it may increase your web server load to unacceptable levels.

Several attempts have been made to encrypt a classfile’s content and then decrypt it in the classloader. Although at first glance this seems like an excellent approach, sooner or later the classfile’s bytecode has to be decrypted in order to be executed by the JVM, at which point it can be intercepted and decompiled.

Book Outline

Decompiling Java is not a normal Java language book. In fact, it is the complete opposite of a standard Java textbook where the author teaches you how to translate ideas and concepts into Java. You’re interested in turning the partially compiled Java bytecodes back into source code so that you can see what the original programmer was thinking. I won’t be covering the language structure in depth, except where it relates to bytecodes and the JVM. All emphasis will be on Java’s low-level design rather than on the language syntax.

In the first part of this book, Chapters 2 through 4, I’ll unravel the Java classfile format and show you how your Java code is stored as bytecode and subsequently executed by the JVM. You’ll also look at the theory and practice of decompilation and obfuscation. I’ll present some of the decompiler’s tricks of the trade and explain how to unravel the Java bytecode of even the most awkward class. You’ll look at the different ways people try to protect the source code and, when appropriate, learn to expose any flaws or underlying problems with the different techniques so that you’ll be suitably informed before you purchase any source code protection tools.

The second part of this book, Chapters 5 and 6, I will primarily focus on how to write your own Java decompiler. You’ll build an extendable Java bytecode decompiler. You’ll do this for two reasons. First, although the JVM design is fixed, the language is not. Many of the early decompilers cannot handle Java constructs that appeared in the JDK 1.1, such as inner classes. Second, one of my own personal pet peeves is reading a technical computer book that stops when things are just getting interesting. The really difficult problems are then left to the reader as an exercise. For some unknown reason, this seems to be particularly true of Internet-related books. Partly as a reaction against that mentality, I’m going to go into decompilers in some detail with plenty of practical examples in hopefully as approachable a manner as possible.

And while we’re on the subject of pet peeves—sorry, I’ll try to keep them to a minimum—I won’t be covering a potted history of the Internet or indeed Java. This has been covered too many times before. If you want to know about the ARPANET and Oak, then I’m afraid you’re going to have to look elsewhere.⁹

Conclusion

Java decompilation is one of the best learning tools for new Java programmers. What better way to find out how to write code than by taking an example off the Internet and decompiling it into source code? It’s also a necessary tool when some dotcom web developers have gone belly up and the only way to fix their code is to decompile it yourself. But it’s also a menace if you’re trying to protect the investment of countless hours of design and development.

The aim of this book is to create some dialog about decompilation and source code protection. I also want to separate the fact from fiction and show you how easy it is to decompile code and what measures you can take to protect it. Both Sun and Microsoft will tell you that decompilation isn’t an issue and that a developer can always be trained to read a competitor’s Assembler, but separate the data from the instructions and this task becomes orders of magnitude easier. Don’t believe it? Then read on and decide for yourself.

_________________

¹Mocha was one of the early Java decompilers. You’ll see more on Mocha later in this chapter.

²Lexical comes from the word lexicon or dictionary.

³dcc comes from cc, which used to be the standard command-line command for compiling C programs, and still is, if like me you’re IDE impaired.

⁴http://www.threedee.com/jcm/psystem/

⁵http://www.eff.org/Legal/Cases/sega_v_accolade_977f2d1510_decision.html

⁶http://cyber.law.harvard.edu/openlaw/DVD/cases/atarivnintendo.html

⁷http://slashdot.org/article.pl?sid=01/07/17/130226

⁸http://cyber.law.harvard.edu/openlaw/DVD/resources.html

⁹Such as Core Java 2, 6th edition, by Cay S. Horstmann and Gary Cornell (Prentice Hall PTR, 2002).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 1: Introduction

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 1: Introduction