Wednesday, July 31, 2013

Help! How can I ensure I get the encoding right?

I've been struggling with character encoding.
Some tests in our suite use strings like this "100µl" or "N°". The problem was, these tests were getting corrupted as they get pushed into Quality Centre. I found a work around through trial and error, but someone must know why this works.
Here's some experiments. I saved the following "µ°" in 3 files. One is ANSI, one UTF8 and one is "ANSI as UTF-8" (so Notepad++ tells me). Then run the following code:

# encoding: utf-8

utf8 =  File.read("files/utf8.txt", external_encoding:"UTF-8")
aautf8 =  File.read("files/ascii_as_utf8.txt")
ascii =  File.read("files/ascii.txt")

puts "UTF8   >> " + utf8
puts utf8.encoding.names.inspect

puts "AAUTF8 >> " + aautf8
puts aautf8.encoding.names.inspect

puts "ASCII  >> " + ascii
puts "ASCII  >> " + ascii.bytes.pack("U*")
puts ascii.encoding.names.inspect

This produces the following (assuming you are in windows with ruby 2.0 and Lucida font and you whispered the magic incantation "chcp 65001"
UTF8   >> µ°
["UTF-8", "CP65001", "locale", "external"]
AAUTF8 >> µ°
["UTF-8", "CP65001", "locale", "external"]
ASCII  >> ��
ASCII  >> µ°
["UTF-8", "CP65001", "locale", "external"]
So I guess my question is:- How are you supposed to load a file and get it to appear correctly? And secondly... that last line... was that a fluke? Also.. How do you tell if the file is loaded correctly or not?

Make Your Function Work Like a Collection

tl;dr; Kernel#enum_for  lets you treat your method like a collection.

I was iterating over the files in a folder, looking for those with a specific file extension. I had something like this...

  def for_all_files(f, e, &block)
    Dir[f+"/*"].each { |file| 
      if(File.directory?(file))
        for_all_files(file, e, &block)
      else
        block.call(file) if file.end_with?(e)
      end
    }
  end

Now I'm sure Ruby provides a better way to achieve this, but it worked for me... until I found I was filtering this list again in the block I was passing in.

  for_all_files(source_folder, ".ts") do |file|
    if(path_filter =~ file)
       ...
    end
  end

What I wanted was something like this:
  files(source_folder, ".ts").
    select{|file| path_filter =~ file}.each do |file|
    ...
  end

Well as per usual Ruby 2.0 has already thought of that.
 
  def files(folder, ending)
    def for_all_files(f, e, &block)
      Dir[f+"/*"].each { |file| 
        if(File.directory?(file))
          for_all_files(file, e, &block)
        else
          block.call(file) if file.end_with?(e)
        end
      }
    end
    enum_for(:for_all_files, folder, ending)
  end

Note the enum_for. It takes a symbol for a method name and a set of arguments. It returns an Enumerable that encapsulates your method.  I really like the result of this but I don't like the code. If you know a better way, let me know. Post a comment telling me how I should have written it.

Also.. I think this means I can also make it lazy like this...
  files(source_folder, ".ts").
    lazy.
    select{|file| path_filter =~ file}.
    each do |file|
    ...
  end

Nice!


Update:
On a similar theme I was wrapping up a COM API that has iterators using Count and Child(x) methods. With Enumerator.new you can easily wrap these to expose a clean ruby API.
  def child_folders(folder)
    Enumerator.new do |y|
      (1..folder.Count).each {|i| y << folder.Child(i)}
    end
  end

Monday, July 29, 2013

Sidestepping Windows NTFS Path Length Limitations

I was trying to compare a set of tests in Quality Center (QC) against some test scripts I had on a network drive. I decided to dump the QC tests to folders, so I can use "Beyond Compare" to make the comparison.

Problem:

The problem I hit however, was the path length limitation of NTFS. It seems that although NTFS allows quite long path names (around 32767 character) the dos path can't be more than 259 (or something like that). 

Lots of the tests, and most of the folders, in our test tree have long names, meaning we were easily hitting this limit. 

Solution:

Remembering that NTFS allows symbolic links (through the "mklink" command) I decided to:
  • For each folder in QC
    • Create a folder C:\Dump\[QC folder id]
    • Create a symlink in it's parent folder (C:\Dump\[QC parent folder id]\[full name of the folder]) to the C:\Dump\[QC folder id]
  • Dump the tests in the C:\Dump\[QC folder id] folders, naming the files to match the test names. 
This gave me a tree of symlink folders I could navigate that matched the QC folders exactly. Beyond Compare 3.2+ has an option to follow symlinks, which makes this all useful.

Gotchas:

My export script is written in Ruby. I hit a problem trying to execute `mklink /D #{from} #{to}` directly from Ruby. In the end I made a simple bat file:

mklink /d %1 %2

which I called from ruby

puts `link.bat "#{from}" "#{to}"`

See the source at https://gist.github.com/NigelThorne/6102634


GitHub Projects