Download for Windows Download for Linux Download for FreeBSD Download for Mac Manual Wiki Forum IRC Trac

Saturday, November 8, 2008

Three things that Java could learn from C++

This is a mostly off-topic rant that is being posted because I know that a good portion of our user base is composed of programmers.

Ever since I started working with Java, a few months ago, there have been many things that I have felt SHOULD be there, but aren't. Three, in particular, have always bothered me: stack building, operator overloading, and const-correctness. For the rest of this post, I will write snippets in both C++ and in Java to illustrate the differences.


1. Stack building
Creating objects on the heap is slow. Not only does it involve complex algorithms for determining WHERE to allocate the memory, but it also means that you're giving more work for the garbage collector when you're done with the object. In many situations, you will have to allocate many small temporary objects and then never use them again. Stack building makes this much faster, not to mention EASIER. Since Java lacks this feature, you are forced to use annoying workarounds (such as keeping pools of objects) if performance becomes critical.

C++:

bool isNormalPointingAt(Vector3f a,Vector3f b, Vector3f dir) {
    return (a.cross(b).dot(dir) > 0);
}

Java:
boolean isNormalPointingAt(Vector3f a,Vector3f b, Vector3f dir) {
    Vector3f normal = new Vector3f();
    normal.cross(a,b);
    return (normal.dot(dir) > 0);
}

In the above C++ example, the result of a.cross(b) is stored in a temporary variable in the stack, which is then dot()ed with dir. This could be done in Java if every such method allocated a new instance, but that would quickly become prohibitively slow.


2. Operator Overloading
This is a hotly debated topic. Many people insist that operator overloading can lead to unreadable code, if you start overloading operators to perform things that are illogical - but the same is true for method names, you can have an "add()" method that performs a multiplication. However, especially when working with mathematical vectors, operator overloading makes your code way easier to understand. Bellow is a method that, given 4 control points and a "t" parameter in the range [0,1], finds the corresponding point along a cubic B├ęzier curve:

C++:
Vector3f getBezierCubicPoint(Vector3f a, Vector3f b,
        Vector3f c, Vector3f d, float t) {
    float u = 1-t;
    return (u*u*u)*a + (u*u*t)*b + (u*t*t)*c + (t*t*t)*d;
}

Java:
Vector3f getBezierCubicPoint(Vector3f a, Vector3f b,
        Vector3f c, Vector3f d, float t) {
    float u = 1-t;
    Vector3f result = new Vector3f();
    Vector3f tmp = new Vector3f();
    tmp.scale(u*u*u, a);
    result.add(tmp);
    tmp.scale(u*u*t, b);
    result.add(tmp);
    tmp.scale(u*t*t, c);
    result.add(tmp);
    tmp.scale(t*t*t, d);
    result.add(tmp);
    return result;
}


<sarcasm>Yeah, operator overloading really made that code a lot harder to read...</sarcasm> Operator overloading is a tool. It can be very useful, as the example above demonstrates. I don't think that the language designers should remove that power from their users just because some won't know how to use it wisely.


3. Const-correctness
This one is possibly even more infuriating than the above two points, because it makes proper encapsulation of data much more complicated (not to mention slower). Consider this class in C++ and Java:


C++:
class Body {
public:
    const Vector3f& getPosition() const {
        return position;
    }
    void setPosition(const Vector3f& pos) {
        position = pos;
    }

private:
    Vector3f position;
};

Java:
public class Body {
    public Vector3f getPosition() {
        return position;
    }
    public void setPosition(Vector3f pos) {
        position = pos;
    }
    private Vector3f position = new Vector3f();
}

What's wrong with the above example? Here, let me illustrate it with some code:

C++:
Body body;
body.getPosition().x = 5.0f; // compile error

Java:
Body body = new Body();
body.getPosition().x = 5.0f; // works

Basically, Java allows you to modify an object's private member without using its "set" method! There is no way to declare that a given object is "read-only" (final only means that the reference can't be reassigned), so you can't prevent that code from working. If you really must be sure that position can't be modified like that, then you have to change your Java class to this:

Java:
public class Body {
    public Vector3f getPosition() {
        return (Vector3f)position.clone();
    }
    public void setPosition(Vector3f pos) {
        position = (Vector3f)pos.clone();
    }
    private Vector3f position = new Vector3f();
}

In other words, you're now returning a full copy of the object. There are two major problems with this: First, the previous code STILL COMPILES. Since the user will have no way to tell if he's getting the actual position or a copy of it (unless he enters the original source or look in the documentation), he might try to modify the position that he got, and then be surprised that it doesn't work. The second problem is that returning an entire copy might be SLOW. What if, instead of a vector, it was an image? And what if it was accessed many times per second? This could quickly become impossibly slow. Java offers no solution for this problem.

(In case you're wondering, I also had to add a clone() to the set method, because otherwise the caller might call the set method, but keep the reference that it passed to it and modify it later.)


Conclusion
While many programs don't suffer from those problems much, there are certain applications that become a true nightmare to write and maintain - physics simulations or anything to do with vectorial math are the obvious example (that's why I kept using "Vector3f" classes above). If Java is supposed to be a cleaner and easier version of C++, then why is it that writing that sort of code in Java is much harder and much less robust?

I am aware that C# supports some (all?) of the above, but it doesn't as yet have as much portability as Java does. Indeed, it seems that C# is what Java SHOULD HAVE BEEN. If Sun doesn't start fixing this sort of thing in Java, then Microsoft just might take the lead. Meanwhile, I'll have to stick to writing that kind of abomination in Java. To think that it's much easier to code this sort of thing in
C++...

Related Posts by Categories



22 comments:

  1. You're a few years late on point one:

    http://www.ibm.com/developerworks/java/library/j-jtp09275.html

    Java has its problems, but core language performance is rarely one of them these days.

    ReplyDelete
  2. JBullet is a physics engine written in pure Java by an ex-programmer from Havok. He had to create his own vector stack class to avoid the allocation problem in Java:

    http://www.javagaming.org/index.php/topic,18843.0.html

    Here's what he says about it:

    "About the garbage creation, the problem is not in allocating new objects or garbage collection per se, in fact HotSpot is very good in this. But when your garbage creation exceeds some range (like in JBullet) the GC is called very often, eg. 10x or more per second and it badly affects performance, both in throughput and frame to frame jerkiness which is very visible in game."

    This is from only a few months ago.

    ReplyDelete
  3. It's best to see the notion of "heap" as a more abstract concept in Java. When you create an object in Java, it could actually get allocated on the stack if the VM chooses. Obviously, the VM has to make some decision as to whether this is possible and worthwhile. And obviously, in a language like C where that choice is left to the programmer, there are cases where the programmer can make optimisations (decisions to put things on the stack) where a VM algorithm might not. But however suboptimal, the JVM will always make a *correct* decision: for example, it will never accidentally allocate something on the stack and then let a pointer to that object escape from the function where it is allocated. In typical applications, this guarantee of correctness outweighs the need to control object allocation, and the JVM's allocation is "good enough". But yes, there'll be some corner applications. If you've found one, congratulations, use C instead. Your memory management will be far more of a pain in the arse, but yes you'll be able to optimise it.

    Similarly, if you find that C doesn't optimise loops sufficiently, you can always write in assembler. It's essentially the same argument...

    Your statements about variable access aren't quite correct, or at least not fully accurate. You can only modify a private variable from *within* that same class (or an embedded class).
    And if you declare an instance variable final (be it a primitive or reference), it can't be modified once set (and indeed must be set by the time the constructor exits). This is crucial as of Java 5, as it now provides an efficient mechanism for creating immutable, thread-safe objects.

    If Vector3f lets you modify its component variables directly, that's just a decision that was made by the writers of that class; they could have made it impossible.

    ReplyDelete
  4. Thanks for the feedback. Yes, I realize that #1 isn't as much of an issue as I thought it was, after I read p-static's article, but it's still some extra cumbersome syntax, especially if operator overloading were to be involved.

    As for the const-correctness, declaring a variable final only makes it so you can't reassign it, but you can still modify its contents.

    So, for example, in Java:
    final Foo foo = new Foo();

    Is equivalent to C++'s
    Foo * const foo = new Foo(); // Can't reassign foo

    But different from C++'s
    const Foo * foo = new Foo(); // Can't modify foo
    or
    const Foo * const foo = new Foo(); // Can neither modify nor reassign foo

    In the class that I've posted above, there's no way to "secure" the inner Vector3f - even if Vector3f declared x, y and z as private, I could still do:

    body.getPosition().setX(5.0f);

    The only way around that would be to have getPosition() return a DIFFERENT class that does not expose any setter, but that won't work as well if you're working with, say, images.

    ReplyDelete
  5. I'm surprised that you didn't mention the mysterious lack of the 'unsigned' keyword in Java. In terms of pure function, adding this doesn't seem like it would be terribly complicated, and it would remove a lot of ridiculous hacks to get around it.

    ReplyDelete
  6. amz -- you're really just talking about design decisions (good or bad) of the Vector3f class. If you want to create an immutable 3-component vector class in Java, you really can do that! Yes, Vector3f lets you modify the vector. So create a class with all instance variables declared private and then don't provide a set() method.

    If you need to *subclass* an existing class such as Vector3f and "remove" the set() methods, then you can override them and throw UnsupportedOperationException():

    public class ImmutableVector3f extends Vector3f {
    public void setX(float x) {
    throw new UnsupportedException();
    }
    ...
    }

    In general, it's good practice for publically accessible fields of a class to be accessed via get/set methods rather than declaring the actual internal variables public. Unfortunately there are a few rogue classes in the JDK, such as Rectangle (and possibly Vector3f -- I don't just remember) that naughtily have their internal variables declared public. That's just bad design on the part of those particular classes. In principle, Java doesn't need something like 'const' because control of internal state of a class should be delegated to methods with a clear accessibility policy (public, protected...).

    ReplyDelete
  7. First of all, I sure do agree that public variables should be avoided at all costs! Aegisub violates that quite often, and it typically makes me regret it later. ;)

    But you can certainly see that having to create a subclass for every class that I want to keep const-correctness is not very sane. This has nothing to do with Vector3f - I'm only using it as an example. Consider this:

    public class A {
    private i;
    public int getI() { return i; }
    public int setI(int value) { i = value; }
    }

    public class B {
    private A member = new A();
    public A getMember() { return member; }
    public void setMember(A value) { member = value; }
    }

    public class C {
    private B member = new B();
    public B getMember() { return member; }
    public void setMember(B m) { member = m; }
    }

    Then say that you get an instance of C that you aren't allowed to modify. In other words, you can't do this:

    C foo = getC();
    foo.getMember().getMember().setI(5);

    Your solution would involve these new classes:

    public class Aconst extends A {
    public Aconst(A a) { i = a.getI(); }
    public setI(int value) { throw new UnsupportedException(); }
    }
    public class Bconst extends B {
    public Bconst(B b) { member = b.getMember(); }
    public void setMember(A value) { throw new UnsupportedException(); }
    }
    public class Cconst extends C {
    public Cconst(C c) { member = c.getMember(); }
    public void setMember(B m) { throw new UnsupportedException(); }
    }

    The getters of B and C would also have to be modified as follows:

    public A getMember() { return new Aconst(member); }
    public B getMember() { return new Bconst(member); }

    And we'd still be left with the problem that the code WILL COMPILE, and only cause a RUN-TIME error. A possible solution would be to move all the setters to the derived class, I guess. Either way, I'm sure that we can agree that this can hardly be called a good solution to this all-too-common problem.

    You could say that const-correctness isn't something that you need often, but I say that it's something that you ALWAYS need - much like encapsulating members by declaring them private, I consider it a vital aspect of OOP.

    ReplyDelete
  8. (Disclaimer: I've tended to work with Java exclusively for my whole working life of all of 1 years, so my thinking may be coloured accordingly.)

    Point 1: Yes yes yes! Pointer-bump heap allocation may be 2 instructions, garbage collection may on average be miles faster than manual... But they are still both slower than 0 cost! No cast-iron escape analysis guarantees, or ways to manually specify stack allocation genuinely does make it harder than it needs to be for high performance numerical computing.

    Points 2: Well said - the arguments seem to boil down to:
    Pro: Arithmetic on Java's arbitrary precision decimal class, matrices, vectors, complex numbers etc are ridiculously verbose and ugly.
    Con: People will apply them inappropriately and they are hard to look up.

    Which seems reasonably balanced to me and I have lived through far more matrix maths in Java than I have abominable operator mess in C++.

    Would just say, though, that I prefer all (Lisp, Haskell etc) or nothing (C, Java etc). Picking an arbitrary subset of possible operators and enforcing their precedence is a half-baked solution that is just going to lead to nonesense - does anyone seriously claim bitshifting ostreams by char arrays is sensible syntax?

    3: Some kind of compiler-enforced enhanced final might save a bit of bother, but by and large I like to design such that accessors on value objects are kept to an absolute minimum, so in practise don't find it a problem.

    Java also does not have much to learn about the matter from a language confused enough to have const, mutable and const_cast as keywords, not to mention the exact same problem of non-const pointers/references to const values passed to setPosition. ;)

    ReplyDelete
  9. About the 3 points I only agree with the 2nd one, but not because of that reason. I think operator overloading just supports the "uniform access" principle.

    Lets suppose a class that implements any kind of sorting method.

    class SortMethod1 {
    public List<Comparable> sort(List<Comparable> l){
    while(...)
    if(o1 > o2) exchange(l,o1,o2);
    }
    }
    You might have that functionality as a template instead of a class and the only thing you need to do is to implement the operator ">" for the Class you want to sort.
    In that way, it doesn't matter if you are using strings, int or char, your algorithm will sort them.


    And... what do you say about passing callback functions as a parameter? don't you think it would be interesting too?

    ReplyDelete
  10. Regarding const access: it's been a while since I coded in C++, but my conclusion back then was that it's a very nice idea in theory but breaks down very fast in large projects. The problem is that if one class is not diligent in declaring what operations are const and what aren't, then it sort of spoils it for the rest of us. For example, if you have
    class Foo {
    public:
    int getX() { return x; }
    private:
    int x;
    }
    getX is really a constant operation, but it isn't declared as such. Now, this class isn't mine, but I want to use it. If I try to be a good citizen and declare my const's correctly:
    class Bar {
    public:
    int getFooX() const { return foo.getX(); }
    private:
    Foo foo;
    }
    The compiler won't believe me that getX() is const and will complain, and I have to resort to casts if I am to remain faithful. Of course, the correct solution is to fix Foo, but when coding in an existing huge project that's filled with these, you get to learn quickly to abandon all hope and just give up on const... Your luck may have been better than mine though.

    ReplyDelete
  11. I've programmed professionally for over 12 years now. I have 6 years experience in C and C++ (2 and 4 respectively), and 6 years experience in Java.

    Firstly, to clear up a misconception: Java is not a replacement for C++. It is an alternative to C++ for a large number of applications, but not all.

    For the most part of your complaint, what I see is 30% language issues and 70% a sub-optimal API on the Java side (likely chosen to closely match what C++ programmers are used to).

    As was stated earlier, the JVM decides whether to allocate on the stack or on the heap based on the context. It's not as efficient as it could be if you controlled it yourself, but it does guarantee that anything allocated is freed properly, and I for one am sick to death of those kinds of memory leaks. Java has a number of performance/stability tradeoffs that in general make sense.

    Operator overloading, like checked exceptions, is one of the design mistakes made with Java. After the horrible implementation in C++ it is understandable that the Java creators would shy away from operator overloading, but they threw the baby out with the bath water. Unfortunately, there doesn't seem to be any policy change coming down the pipeline. Too bad, considering they finally conceded some ground on generics.

    I would definitely have liked to have better control over what is read-write and what is read-only without the boilerplate of getters/setters. What would have been steller is a property scheme, where you access them as if you were accessing a real member, but the developer can control the behavior via property methods (or just state that the property is direct access for read and/or write). Closures would be nice, too.

    My biggest complaint about Java so far is all the bloody boilerplate code. A language should make it EASIER to abstract, not harder. It's one of the reasons I like Python.

    ReplyDelete
  12. >I would definitely have liked to have better >control over what is read-write and what is >read-only without the boilerplate of >getters/setters.

    Sounds like Delphi...

    ReplyDelete
  13. Hi amz,
    Nice writeup. I like the way you have compared C++ with Java.
    Here is a tutorial on internal of Java class file, i.e. structure of a Java Class.
    Read this: Java Class File Format

    Do post latest content. I like to visit your site.

    ReplyDelete
  14. as far as I'm concerned " I don't think that the language designers should remove that power from their users just because some won't know how to use it wisely." sums up java perfectly.

    ReplyDelete
  15. Wonderful blog! I found it while searching on Yahoo News. Do you have any suggestions on how to get listed in Yahoo News? I’ve been trying for a while but I never seem to get there! Many thanks.
    2048 online | tanki online 3

    ReplyDelete
  16. Thanks for sharing your info. I really appreciate your efforts and I will be waiting for your further write.
    Thanks for sharing !
    tanki online 2 | 2048 game online

    ReplyDelete

If you need help with Aegisub or have a bug report please use our forum instead of leaving a comment here. If you have a feature request, please go to our UserVoice page.

You will get better help on our forum than in the blog comments.