I remembered reading a comment Matz made once about the ! methods. It went something like "! methods are destructive and should only be used when performance is needed". I found a benchmark example that show sort vs sort! speeds are depended on the garbage collection[1]. Today I was reading a chunk of ruby-prof code and found the comment.
def calculate_thread_times
@result.threads.each do |thread_id, methods|
top = methods.sort.last
thread_time = 0.01
thread_time = top.total_time if top.total_time > 0
@thread_times[thread_id] = thread_time
end
end
I rewrote the code with a sort! in my head but wonder would it really have any affects on the time? A little context for test. I just used ruby-prof in an around filter in the application.rb file for a rails project. The html file that ruby-prof produces is not small and includes parts of the rails framework, mysql library, and core ruby code.
require "benchmark"
def calculate_thread_times!(result)
thread_times={}
result.threads.each do |thread_id, methods|
methods.sort!
thread_time = methods.last.total_time > 0 ? methods.last.total_time : 0.01
thread_times[thread_id] = thread_time
end
end
def calculate_thread_times(result)
thread_times ={}
result.threads.each do |thread_id, methods|
top = methods.sort.last
thread_time = 0.01
thread_time = top.total_time if top.total_time > 0
thread_times[thread_id] = thread_time
end
end
Benchmark.bm(12) do |test|
test.report("no_bang:") do
lresult = result
calculate_thread_times(lresult)
end
test.report("bang:") do
lresult = result
calculate_thread_times!(lresult)
end
end
Clearly not enough improvement to stop caching thread times but, now the methods are in order. We could stop caching and access the thread time like
@result.threads[thread_id].last
Since the calculate_thread_times is called when you create a new RubyProf::GraphHtmlPrinter you should never need to call sort again. For example in the ERB template. I did find that using Benchmark.bmbm cause the real values to be swapped just like in the rdoc example[1].
I just submitted a patch to the ruby prof ruby forge group[2]. It is not a big improvement over all. Using my numbers above the current GraphHtmlPrinter use 0.013572 just for sorting. It should now only take 0.004242. Saving the world most a second....
[1] http://www.ruby-doc.org/core/classes/Benchmark.html
[2] http://rubyforge.org/tracker/index.php?func=detail&aid=16364&group_id=1814&atid=7062I found my self need to find the greatest number in an array of random numbers as fast as I can. Which sounds like a comp sci 101 problem visit every node and keep track of the highest number. I came up with a few different ways to do this with ruby's arrays and set up benchmarking.
n = 1000
marray = Array.new(6000)
range = (0...6000)
range.each{|i| marray[i]=rand(4000000)}
require "benchmark"
Benchmark.bmbm(1) do |test|
test.report("Sort!:") do
n.times{
array =Array.new(marray)
array.sort!
array.last
}
end
test.report("max:") do
n.times{
array =Array.new(marray)
array.max
}
end
test.report("Sort:") do
n.times{
array =Array.new(marray)
array.sort.last
}
end
test.report("range index:") do
n.times{
max=0
array =Array.new(marray)
range.each{|x| max=array[x] if array[x]>max}
}
end
test.report("index:") do
n.times{
max=0
array =Array.new(marray)
array.each_index{|x| max=array[x] if array[x]>max}
}
end
test.report("size indexing:") do
n.times{
max=0
array =Array.new(marray)
(0...array.size).each{|x| max=array[x] if array[x]>max}
}
end
test.report("each:") do
n.times{
max=0
array =Array.new(marray)
array.each{|x| max=x if x>max}
}
end
test.report("uniq each:") do
n.times{
max=0
array =Array.new(marray)
array.uniq!
array.each{|x| max=x if x>max}
}
end
test.report("pop:") do
n.times{
max=0
array =Array.new(marray)
while x=array.pop
max = x if x>max
end
}
end
end
And the winner is Schwartzian transform with max as a close second!
Some might say its not fair. That I changed the out come by measuring it, but I had to do it. Thats right sorting a random array to find the largest value of that array is the quickest way I can find. I have also tried this with 60k and 600k arrays with the same relative speeds. Since Ruby 1.8 it has used the Schwartzian transform[1] for its sort method. Which is clearly faster then each but why is each_index twice as slow as each? I thought the size indexing test was slower because of the range object creation but I tried it with a precompiled range and it still was 8 seconds.
[1] http://en.wikipedia.org/wiki/Schwartzian_transformMy world has just been turned upside down. I need to start benchmarking some real data. I am using large arrays of random. Anyway, after trying to find the max number in array I found using the each method was slower then sorting. I then wondered if different types of iteration would be faster. So after some wings for breakfast at the Seattle airport I created another benchmark.
n = 10000
start_array = Array.new(6000)
(0...6000).each{|i| start_array[i]=rand(4000000)}
require "benchmark"
Benchmark.bm(1) do |test|
test.report("each:") do
n.times{
start_array.each{|x|
x
}
}
end
test.report("for:") do
n.times{
for x in start_array
x
end
}
end
test.report("each_index:") do
n.times{
start_array.each_index{|i|
start_array[i]
}
}
end
test.report("range each:") do
n.times{
(0...start_array.size).each{|i|
start_array[i]
}
}
end
test.report("each_with_index:") do
n.times{
start_array.each_with_index{|x,i|
x
}
}
end
end
And the results?
If I left out any ways to iterate please let me know.
So there are many different ways to tell if an array includes an object. The results will change depending on where in the array the object is and if the object is in the array or not. A I once again used a random array to benchmark a few ways to do this.
n = 10000
start_array = Array.new(6000)
(0...6000).each{|i| start_array[i]=rand(4000000)}
require "benchmark"
Benchmark.bm(1) do |test|
test.report("include?:") do
n.times{
start_array.include?(5)
}
end
test.report("for loop:") do
n.times{
found = false
for x in start_array
if x == 5
found = true
break
end
end
}
end
test.report("any:") do
n.times{
start_array.any?{|x| x==5}
}
end
test.report("find:") do
n.times{
true if start_array.find{|x| x==5}
}
end
test.report("select:") do
n.times{
true if start_array.select{|x| x==5}
}
end
end
The results?
I ran the test again but this time I inserted 5 at index 200. No shock they all cut out when they find the object they are looking for.