madbean

Skipping and Seeking; strange Java API discrepancy of the day

18 Mar 2003

There is a common operating system problem when trying to read very large files; if your OS only supports 32 bit file addressing, you can't "seek" into a file past 4 GB (4 GB = 2^32/1024/1024/1024).

But this is not a problem for most operating systems now-days, because you can "skip" past 4GB. For example, you do a "seek" just up to 4GB, then you can "skip" forward in 4GB leaps. (If you think of a file as a linked list of inodes, then you can see how this skip works for very large files.)

And Java has support for skipping and seeking too; java.io.InputStream supports skip, and java.io.RandomAccessFile supports seek. Both these methods allow you to skip/seek a long number of bytes; and make no mistake, that means 8,589,934,591 GB ( =(2^63 - 1)/1024/1024/1024). (On 32 bit OSes, I assume these methods must be implemented with multiple skips.)

Interestingly RandomAccessFile implements java.io.DataInput and java.io.DataOutput, but not java.io.InputStream nor java.io.OutputStream; I've always found that curious.

But I find the discrepancies in the following methods more than curious, it's down right strange:

  • The java.io.InputStream class has a skip method like this:
    public long skip(long n) throws IOException.
  • The java.io.DataInput class has a skip method like this:
    public int skipBytes(int n) throws IOException.

For starters, they are named differently. But more importantly why does one take an int and the other a long? I mean, are you less likely to want to skip 8,589,934,591 GB ahead in a file if you are using DataInput as opposed to InputStream?

*smirk*

  • Home
  • Blog