Discussion:
[Felix-language] filesystem interaction
Erick Tryzelaar
2007-09-10 07:34:29 UTC
Permalink
So the sqlite test keeps failing on my mac because the sqlite that
comes with osx doesn't include the ability to test for if a table
exists in a database. So, probably the best thing to do would be to
remove the database first before creating it. However, we don't have
any handy os-independent filesystem commands. Since we need that
anyway, how do you guys think we should implement it? Here's a simple
survey of how others do it:

Python and Ocaml:
Python throws most it's os interaction into os.py which makes it a bit
unwieldy. Ocaml does basically the same thing by putting it all in
Unix. I'm betting I don't need to go into too much detail. Since they
both implement one main location for everything, some functions aren't
implemented, which makes the interface inconsistent. They throw either
OSError for python or Unix_error for ocaml on error.

[python]: http://docs.python.org/lib/os-file-dir.html
[ocaml]: http://caml.inria.fr/pub/docs/manual-ocaml/libref/Unix.html


Ruby:
Ruby puts all of its filesystem routines in one module called
FileUtils. It's operations can quite high level. For instance, chdir
can optionally take a closure that will cd into the specified
directory, run the closure, and cd to the original dir afterwards.
Pretty much every function takes one of the following: force, noop,
preserve, and verbose. This simplifies having to manually create a lot
of these options.

One interesting idea is that they have some sub-modules that set
default arguments on the operations. For instance, you can use
FileUtils::DryRun where it has the same interface as FileUtils but no
actual modification of the filesystem will take place.

On error, it raises exceptions named after unix's errnor, like
Errno::EEXIST and the like.

[ruby]: http://www.ruby-doc.org/stdlib/libdoc/fileutils/rdoc/index.html


D:
This seems to be similar to Ruby but with less sugar. One nice thing
is that instead of presenting a separate glob function, their listdir
function is overloaded to accept a glob pattern or a regex. Errors are
done by raising a FileException

[d]: http://www.digitalmars.com/d/phobos/std_file.html


Haskell:
It's closer to D than Ruby. The main interesting thing is how errors
are done via monads. See the second link for how it works. Since we
don't have exceptions, this could be an interesting way to do error
handling. Without type inferrence, it could be unwieldy, however.

Another minor interesting note is that they have some predefined
directories, like the home directory, and a function to return an
app's data directory.

[haskell]: http://haskell.org/ghc/docs/latest/html/libraries/base/System-Directory.html
[haskell's basic io]: http://www.haskell.org/onlinereport/io-13.html


QT:
This is the only c++ cross platform library I'm vaguely familiar with.
Instead of combining all the filesystem commands like Ruby, D, and
Haskell, QT just puts the file operations in QFile and all the
directory operations in QDir. Public methods operate on the current
file or directory, whereas static methods operate on a string. Other
than that, it's reasonably low level.

Errors are done by returning a bool on if the operation was
successful, and setting a global static member value "error" either
QFile or QDir for more info.

[qt's qfile]: http://doc.trolltech.com/4.3/qfile.html
[qt's qdir]: http://doc.trolltech.com/4.3/qdir.html


I think that pretty much covers everything. So, how shall we do ours?
skaller
2007-09-10 20:25:50 UTC
Permalink
Post by Erick Tryzelaar
I think that pretty much covers everything. So, how shall we do ours?
Well, there are lots of things covered by 'filesystem' interaction :)

The right way to search for files is probably using an iterator,
which is closely related to a stream of strings and/or a channel
of strings. Filesystem is just another data structure, that
happens not be in memory.

SO you could have an iterator which returns

union file_entry_t =
| Dir of string
| File of string
;

It is then easy to:

filter so only files returned
filter so only directories returned
stack of iterators for recursing into directories
filter based on regexp (using Tre)


Could add:

| Symlink of string // Unix, Windows also has some weird thing
| Special of string // device, blah blah

This union is a kitchen sink.. it provides all the possibilities
even if any given OS doesn't support them all. This allows algorithm
to be coded for all OS which adapts to the OS without knowing which
OS it is .. :)
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
Erick Tryzelaar
2007-09-10 22:27:41 UTC
Permalink
Post by skaller
Post by Erick Tryzelaar
I think that pretty much covers everything. So, how shall we do ours?
Well, there are lots of things covered by 'filesystem' interaction :)
The right way to search for files is probably using an iterator,
which is closely related to a stream of strings and/or a channel
of strings. Filesystem is just another data structure, that
happens not be in memory.
Yeah thats a good idea. It'd be nice if we had nice sugar to do lazy
lists with support for destructuring through pattern matching. Right
now we'd have to do some semi-ugly things to signify the ends of a
list such as passing in a non-local goto or using option. It'd also be
nice if we could use it as a front end for both generators and
channels.

Oh and there's another appropriate ruby module:

http://www.ruby-doc.org/core/classes/File.html

I do like the interface that ruby presents, and now the dot operator
change we can almost copy that interface.

Dir::foreach "." (proc (d:string) {
println d;
});

val d:Dir::t = ...;
d.foreach (proc (d:string) {
println d;
})

You know, we could probably implement ruby's shorthand functions, where this:

{ |x:int, y:string| y + str(x) }

Was equivalent to

gen (x:int, y:string) { ... }

Also, since do-done are already keywords, we could make them
synonymous with braces in this situation, so that we could reduce the
foreach to this, which is much easier to read:

d.foreach do |d:string|
println d;
done;

I'm just not sure how we handle errors, though.
skaller
2007-09-11 02:38:56 UTC
Permalink
Post by Erick Tryzelaar
Also, since do-done are already keywords, we could make them
synonymous with braces in this situation, so that we could reduce the
d.foreach do |d:string|
println d;
done;
I'm thinking to make begin/end keywords. Begin/end mark a block,
and are equivalent to { } sometimes, like Ocaml.
do/done marks a loop or control structure *without* a scope,
i.e. it's labels and gotos.

So vaguely:

whilst c begin sts end;
==> whilst c do { sts }; done;
Post by Erick Tryzelaar
I'm just not sure how we handle errors, though.
What errors?

If you make a programming error, and Felix detects it,
Felix aborts. This is right IMHO. It's a bug. Fix it!
Felix has a Zero Tolerance policy for bugs.

So .. if you say 'open f' and f doesn't exist, Felix
aborts, as it should. Otherwise use

let h = try_open f in match h with
| Failed => ...
| Handle h => ..
endmatch


Felix does 'exceptions' with some messiness. In procedures,
use a non-local goto:

proc f() {
proc exc(){ goto err; }
println "Stuff";
g (the exc);
return;
err:>
println "Error";
}

proc g(exc:1->0) {
...
exc(); // throw exception
throw exc; // throw exception

val x = h exc 1;
}

fun h(exc:1->0) (x:int)= {
if x<0 do throw exc; done;
return x;
}

You have to 'throw' an exception out of a function.
Throw really does use a C++ throw. Probably should make

fun throw[t]: (1->0)->t = "(?1) (throw (con_t*)$1)";

The problem with 'throw' is that it doesn't unwind
procedure stacks properly: it leaves the _caller variable
set, which means if there is a pointer to a variable in
the call chain, the chain is reachable and won't be collected.

Really, we should have call data= _caller + display including
pointer to data object, so unwinding the call data is always
safe, even if there is a pointer to some variable in the
stack frame .. however this would require TWO allocations.

BTW: _throw is a builtin primitive .. :)

Note that EH will be easier to do if we can ensure all
functions throwing are converted to heaped procedures;
unfortunately it's hard to tell if a closure (func or
proc passed as a value) throws.. it would have to be
made part of the type.
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
Erick Tryzelaar
2007-09-11 03:33:59 UTC
Permalink
Post by skaller
I'm thinking to make begin/end keywords. Begin/end mark a block,
and are equivalent to { } sometimes, like Ocaml.
do/done marks a loop or control structure *without* a scope,
i.e. it's labels and gotos.
whilst c begin sts end;
==> whilst c do { sts }; done;
Sure, makes sense. However, I think it'd be reasonable to consider if
we actually need do/done structures without scope. Lots of languages
don't have this construct.
Post by skaller
Post by Erick Tryzelaar
I'm just not sure how we handle errors, though.
What errors?
If you make a programming error, and Felix detects it,
Felix aborts. This is right IMHO. It's a bug. Fix it!
Felix has a Zero Tolerance policy for bugs.
So .. if you say 'open f' and f doesn't exist, Felix
aborts, as it should.
I disagree. I think the runtime library should only bomb if it's
actually a fatal error. Something like a missing file or other
spurious errors *should* be handle-able.
Post by skaller
Otherwise use
let h = try_open f in match h with
| Failed => ...
| Handle h => ..
endmatch
Sure, except if you have to do a lot of kernel interactions, this gets
really ugly. To, say, change a directory tree's group and permissions,
would be very ugly and unwieldy.
Post by skaller
Felix does 'exceptions' with some messiness.
[snip]
Note that EH will be easier to do if we can ensure all
functions throwing are converted to heaped procedures;
unfortunately it's hard to tell if a closure (func or
proc passed as a value) throws.. it would have to be
made part of the type.
This would be doable if we put enough sugar into it. On the other
hand, monadic exceptions might be simpler. I need to read up on them.
skaller
2007-09-11 04:54:31 UTC
Permalink
Post by Erick Tryzelaar
Post by skaller
I'm thinking to make begin/end keywords. Begin/end mark a block,
and are equivalent to { } sometimes, like Ocaml.
do/done marks a loop or control structure *without* a scope,
i.e. it's labels and gotos.
whilst c begin sts end;
==> whilst c do { sts }; done;
Sure, makes sense. However, I think it'd be reasonable to consider if
we actually need do/done structures without scope. Lots of languages
don't have this construct.
do/done stuff is just goto sugar. You're asking the wrong
question I think. The question is why we need begin/end/scope
block scoped for/while etc statements, when we have Higher
Order Functions???

The goto sugar is useful for low level implementations,
particularly when the HOF version may not be optimised.
Don't forget .. we have no exception handling.

Standard EH is only proper for reporting faults before
program termination. Non-local goto is ugly, but it is
a proper low level primitive (standard EH is NOT).

Some kind of continuation management, e.g. delimited
continuations, may be the best.
Post by Erick Tryzelaar
Post by skaller
So .. if you say 'open f' and f doesn't exist, Felix
aborts, as it should.
I disagree. I think the runtime library should only bomb if it's
actually a fatal error. Something like a missing file or other
spurious errors *should* be handle-able.
Yes and No. Read again: if you say 'open f' and f can't
be open .. this is a Fatal Error by definition. The name
suggests it opens a file .. and open a file it MUST.

If you actually mean 'try to open a file', don't use the
'necessarily open a file' function with post-condition
'the file is open'.
Post by Erick Tryzelaar
Post by skaller
Otherwise use
let h = try_open f in match h with
| Failed => ...
| Handle h => ..
endmatch
Sure, except if you have to do a lot of kernel interactions, this gets
really ugly.
I don't see how this is any different to what C does, except
we have a disciplined static typing regime. If you use 'try_open'
you HAVE to check the result. If you don't want to check the
result, you have to use 'open'.
Post by Erick Tryzelaar
To, say, change a directory tree's group and permissions,
would be very ugly and unwieldy.
I agree there is a problem with escaping context -- Felix only
provides non-local goto, and it does NOT work across functions.

But we can't implement a solution for a non-existant proposal.
I have no idea how to solve this properly.

Undelimited type based exceptions (as in C++) are definitely
wrong, polymorphic variant constructor based exceptions like
Ocaml has are better, but also wrong.

Throwing and catching objects and applying a type/constructor
switch is a totally crap idea.

A sane idea is something like: to throw, you write on a channel.
The catch is a procedure reading on the channel.

So .. if you complete normally the channel is forgotten by
the writer and the reader becomes unreachable and suicides.

If you have an exceptional completion, the writer gains
control and continues on, and when it forgets the channel,
the error-ing context suicides.

So using coroutines/continuations here like that is a MUCH better
idea IMHO, though only an idea at present. Want resumable
exceptions? Sure, well the exception handler can do what
it wants and write back on the exception channel, and the
raiser expecting support can try to read the channel.
Etc ..

The point is..

Felix threads ALREADY subsume exception handling.
Remember .. Felix threads suicide when unreachable so
'unwinding' a context like 'throwing' unwinds the stack
is automatic and already works.

The main problem is that you can't do fthreading inside
functions.

Which is why I implemented the 'get rid of all the
functions' optimisation.. :)

I've checked.. generators convert properly to procedures,
and even 'yield' seems to work. The problem now is simply
that application of closures (expressions for the function)
still use the machine stack.
Post by Erick Tryzelaar
This would be doable if we put enough sugar into it. On the other
hand, monadic exceptions might be simpler. I need to read up on them.
Yep, we will provide sugar when we know WHAT to sugar .. :)

It's a real problem: compiler programming needs to be driven in part
by library implementations, which needs to be driven by user
applications .. and I simply cannot do all three of those things,
in fact I could use a second compiler writer .. and then
there is documentation and advocacy.. ;(

If we could get equivalent of 2-3 full time voluntary work units,
I might double or triple that by throwing money at it.. but we're under
the 1 unit level. For tax reasons I actually have to throw some
cash at something and whatever it is will probably divert attention
from other things.
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
Rhythmic Fistman
2007-09-11 08:29:19 UTC
Permalink
Post by skaller
If you make a programming error, and Felix detects it,
Felix aborts. This is right IMHO. It's a bug. Fix it!
Felix has a Zero Tolerance policy for bugs.
So .. if you say 'open f' and f doesn't exist, Felix
aborts, as it should. Otherwise use
No, a file that you're about to open can be deleted by
other processes. That's not your fault.
skaller
2007-09-11 09:19:12 UTC
Permalink
Post by Rhythmic Fistman
Post by skaller
If you make a programming error, and Felix detects it,
Felix aborts. This is right IMHO. It's a bug. Fix it!
Felix has a Zero Tolerance policy for bugs.
So .. if you say 'open f' and f doesn't exist, Felix
aborts, as it should. Otherwise use
No, a file that you're about to open can be deleted by
other processes. That's not your fault.
That's irrelevant. If you write a program which just opens
a file and can't work if the file can't be opened, the right
thing to do is abort the program. I have LOTS of such programs,
in fact almost every program I have, including Felix compiler
and most of my utility Python scripts are like that.

Not all programs are like that, for example Felix *.par files
are 'trial' opened but if the open fails it doesn't
matter because the *.par file is just a cache.
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
Loading...