Thursday, May 5, 2011

Synchronizing Files With F# and the FileSystemWatcher

So I needed a way to automate change tracking on a set of directories and have those changes merged to another set of directories. In my case, I'm dealing with directories that have a similar make up. That is to say, they contain the same files, folders, etc. Just in different locations. They're essentially clones of one another. The team I'm currently working on calls these packages. I certainly don't agree with the way they implemented it and all the duplication, but I'm not going to manually copy my changes to 2 other directories all day long. So I came up with a simple utility to do the work for me. It's not fully polished yet, but I wanted to get my initial implementation up online. If you actually tried to use the program, things would work, aside from the IOException thats generated after subsequent saves due to the host process somehow maintaining a lock on the files. Now normally it's the developer's fault, and it probably is in my case, but based on the very nature of the function I'm calling, and what it promises to do for me, I doubt it. I'll resolve it in the coming days though. Lastly, I'm working purely with code (.cs, .aspx, .ascx, etc.) so I can get away with calling File.ReadAllText. I'm not even going to think about binary. Maybe it'd still work. Who cares...lol. Enough talking already. Here's the code.

Iteration I - 4 April, 2011

// Learn more about F# at http://fsharp.net
open System.Xml.Linq
open System.Reflection
open System.IO
open System.Linq
open Microsoft.FSharp.Control
open System.Threading
open System

let pathAttributeName = "path"
let xs n = XName.Get(n)
let wildcard = "*.*"

let workflow = async {
    printfn "Started listening at %A..." DateTime.Now

    while Console.ReadLine() <> "q"
        do
            let doc = (Assembly.GetExecutingAssembly().Location |> Path.GetDirectoryName) + "\Synch.config" |> XDocument.Load
            let config = doc.Root
            let root = config.Element("root" |> xs)
            let rootDir = root.Attribute(pathAttributeName |> xs).Value

            let mapdirs (e : XElement) =
                e.Elements("add" |> xs)
                |> Seq.map((fun (e : XElement) ->
                                let dir = [|rootDir; e.Attribute(pathAttributeName |> xs).Value;|] |> String.Concat
                                dir))

            let directories = config.Element("directories" |> xs)
            let masters = directories.Element("masters" |> xs) |> mapdirs
            let slaves = directories.Element("slaves" |> xs) |> mapdirs

            let directoryWatchers = masters
                                    |> Seq.map((fun d -> 
                                                    new FileSystemWatcher(d, EnableRaisingEvents = true, Filter = wildcard)))

            directoryWatchers 
            |> Seq.iter((fun w -> 
                            w.Changed.Add((fun e -> 
                                            let merge fp fn =
                                                let targetDir = Path.GetDirectoryName fp
                                                let content = fp |> File.ReadAllText

                                                slaves
                                                |> Seq.iter ((fun d -> 
                                                                    let fileName = [|d; "\\"; fn;|] |> String.Concat

                                                                    if File.Exists fileName then
                                                                        try
                                                                            File.WriteAllText(fileName, content)

                                                                            printfn "Merged %s to %s at %A %s" fp fileName DateTime.Now Environment.NewLine
                                                                        with 
                                                                            | :? IOException as e ->
                                                                                printfn "Antwan said he'd handle it later. He's eager to get his post up now!!"
                                                                    else
                                                                        printfn "File %s did not exist in directory %s. No merge required. Aborting...%s" fileName d Environment.NewLine
                                                                    ))

                                            e.Name |> merge e.FullPath))))

    printfn "Stopped listening at %A..." DateTime.Now
}

let start() = 
    workflow |> Async.RunSynchronously

do start()

And here's the configuration file I use. No it's probably not the most intuitive xml file you've ever seen, but it works for me. I called it Synch.config and placed it in my bin/Debug directory.

<watch>
 <root path="C:\Users\A-Dubb\Documents\" />
 <directories>
    <!-- Directories I'll be working in -->
    <masters>
      <add path="TestDir" />
    </masters>
    <!-- Directories I want my work merged to -->
    <slaves>
      <add path="TestDirII" />
    </slaves>
 </directories>
</watch>

It's nothing too complex. I just listen for changes in the master directories and merge them to the slave directories. Pretty cool though. It's definitely a good candidate for a Windows Service. I cheated with a while loop to force the main thread to wait on me without exiting the program. There are numerous ways to achieve that behavior as well, but it was quick and painless. I bet you something like DropBox makes use of a similar construct like FileSystemWatcher to keep your files in synch between machines. I'll upload the patch to resolve the IOException once I have time to delve into it.

Wanna get her up and running quickly? You got it. Just download Funtastic. It's a lightweight F# editor. Basically just a wrapper around F# Interactive. She's quite handy though.

For now, adios my friends.

Iteration II - 5 April, 2011

Ok. So I figured out what the problem is. First off, my exception handling code is in the wrong place. It should be concentrated on the attempt to read the file that was actually changed. Not the files that need to be patched. Number 2, since I'm subscribing to the Changed event, it gets triggered just by me simply reading the file. It's cause the file's metadata get's changed by the OS upon reading it (LastAccessedDate). So the Changed event happens so fast (probably nanoseconds) that as I'm reading the file the first time around, I attempt to read it again. Don't believe me? Open up once of your tracked files in Notepad++ and watch it get logged to the console. Even better, upon running the application, you'll notice that you always see the same file get merged to each directory twice. So instead of seeing 2 sets of output, you see 4. I'll have to find a way to suppress notifications for reads. I did try opening the file with FileAccess.Read and FileShare.Read. That didn't work 100% of the time but did seem to be a lot better than what I had before. I also like how ReadAllLines and ReadAllBytes are more high level. I don't have to worry about managing streams, disposing them, reading them, etc. The problem is, you don't have control over access permissions when consuming the file because of the defaults .NET sets for you. I'd never have a source file that's over 2 gigs, but that the most you can load in memory with my current approach because of Int.MaxValue. Maybe the guys at Microsoft know a way around that with their implementation. Who knows? Lastly, I'm working with raw bytes now since that's the fundamental makeup of every file whether it be binary or text based. So I take back my statement from earlier. I kind of do actually care now. I thought I'd have to make some fancy factory that knows how to read and write each file based on it's extension. That'd be one of three things: Either an infinite switch block, a jam packed dictionary, or a regex longer than the Mississippi. Anyway, here's my current revision. You're probably starting to think I'm trying to obsolete git by now. Forgive me. I just want an immediate view of how many times I took a swing at this thing. Don't worry. I'll call it a strikeout at 3.

// Learn more about F# at http://fsharp.net
open System.Xml.Linq
open System.Reflection
open System.IO
open System.Linq
open Microsoft.FSharp.Control
open System.Threading
open System

let pathAttributeName = "path"
let xs n = XName.Get(n)

let workflow = async {
    printfn "Started listening at %A..." DateTime.Now

    while Console.ReadLine() <> "q"
        do
            let doc = (Assembly.GetExecutingAssembly().Location |> Path.GetDirectoryName) + "\Synch.config" |> XDocument.Load
            let config = doc.Root
            let root = config.Element("root" |> xs)
            let rootDir = root.Attribute(pathAttributeName |> xs).Value

            let mapdirs (e : XElement) =
                e.Elements("add" |> xs)
                |> Seq.map((fun (e : XElement) ->
                                let dir = [|rootDir; e.Attribute(pathAttributeName |> xs).Value;|] |> String.Concat
                                dir))

            let directories = config.Element("directories" |> xs)
            let masters = directories.Element("masters" |> xs) |> mapdirs
            let slaves = directories.Element("slaves" |> xs) |> mapdirs

            let directoryWatchers = masters
                                    |> Seq.map((fun d -> 
                                                    new FileSystemWatcher(d, EnableRaisingEvents = true, IncludeSubdirectories = true)))

            directoryWatchers 
            |> Seq.iter((fun w -> 
                            w.Changed.Add((fun e -> 
                                            let merge fp fn =
                                                let targetDir = Path.GetDirectoryName fp
                                                
                                                try
                                                    use fs = File.Open(fp, FileMode.Open, FileAccess.Read, FileShare.Read)
                                                    
                                                    let size = fs.Length |> int
                                                    let buffer = Array.zeroCreate<byte> size
                                                    
                                                    fs.Read(buffer, 0, size) |> ignore
                                                    
                                                    slaves
                                                    |> Seq.iter ((fun d -> 
                                                                    let fileName = [|d; "\\"; fn;|] |> String.Concat

                                                                    if File.Exists fileName then
                                                                        File.WriteAllBytes(fileName, buffer)

                                                                        printfn "Merged %s to %s at %A %s" fp fileName DateTime.Now Environment.NewLine
                                                                    else
                                                                        printfn "File %s did not exist in directory %s. No merge required. Aborting...%s" fileName d Environment.NewLine
                                                                    ))
                                                with
                                                  | :? IOException as ioe ->
                                                        printfn "exception occured %s %s" ioe.Message Environment.NewLine
                                               
                                            e.Name |> merge e.FullPath))))

    printfn "Stopped listening at %A..." DateTime.Now
}

let start() = 
    workflow |> Async.RunSynchronously

do start()

I'll be back for my last strike later.

Iteration III - 5 April, 2011 3:54 PM

Ok. I spent a few minutes looking around at something I completely ignored to start out with. This line allows you to filter your events. You can filter events using F#, but this is even simpler. It still doesn't work for me though, because as soon as I open the file, that in and of itself is considered a change.

new FileSystemWatcher(d, EnableRaisingEvents = true, IncludeSubdirectories = true, NotifyFilter = NotifyFilters.LastWrite)))

I'm throwing in the flag for now, but you have to admire my persistence. I'm sure I could throw in a Sleep call in between reads, but I don't want to hack anything right now. It was kind of fun heuristically playing with the FileSystemWatcher. At least I'm fully aware of its potential limitations. Cool :).

Source can be found here.

No comments:

Post a Comment