Tuesday, June 26, 2007

Comparing thousand of file names

This week I am moving all my files from one server to another. I use Filezilla to download them to my machine, and will later upload them to the new server. Unfortunately, the connection gets broken from time to time and I do not remember which folders have been downloaded and where I broke off. What to do?

First I looked for existing solutions and found Beyond Compare. I spent half an hour trying to figure it out, but gave up. It was probably too advanced for me, I just wanted to check if all the files had been downloaded. I couldn't care less about file size, date stamp, crc verification, ...

So, I wrote my own php script. Found code that goes through all the files in all sub folders of a given folder and adds up their sizes. Not what I wanted, but easy enough to edit to my needs. The code used stack (array) and not recursion to get to all the folders. It used array_shift(array) that pulls an element off the stack.

I ran it on the server and saved the paths to a mysql table. Then I exported the table, imported it into my local mysql and edited the code to fill a table with the downloaded paths. Finally I ran a query to show which files were missing either locally on the server.

The server table had 8190 records and the local one 8090. After having indexed the path fields the query ran fast and displayed 297 records:

server local
dirsize.php NULL
pmachinefree/1.jpg NULL
pmachinefree/2.jpg NULL
pmachinefree/3.jpg NULL
pmachinefree/4.jpg NULL
---
NULL pmachinefree/images/uploads/påske.jpg
NULL pmachinefree/images/uploads/påske22.jpg

2 comments:

Anonymous said...

Its a nice solution, but I think using an FTP client that supports resuming would have been easier :)

damezumari said...

Filezilla supports resuming, but I think the queue is lost when I turn off the machine.

Anyway, the query only came up with newer files not being downloaded, so I was doing OK.