ruby - Faster cycle detection in a Directed Acyclic Graph? -
ruby - Faster cycle detection in a Directed Acyclic Graph? -
i have programme in ruby 1.9.3 constructs rubytree. info best described beingness directed acyclic graph (dag); please note not polytree. well, @ to the lowest degree info should dag, despite users' best efforts foil programme bad data.
i build dag dynamically parsing xml document. xml document not explicitly specify tree structure, provide cross-references of integer ids found links between elements in document.
i need ensure rubytree not contain cycles. source info may (erroneously) have cycle, , if does, programme needs aware of it, rather entering infinite loop or crashing. accomplish currently, mixed in ruby standard library's tsort module rubytree's tree::treenode
class. uses tarjan's algorithm perform topological sort on graph each time node added. during topological sort, if cycle detected, exception raised -- want.
for example:
module tree class treenode include tsort def tsort_each_node(&block) self.each(&block) end def tsort_each_child(node, &block) node.get_children().each { |child| yield kid } end def add(child, at_index = -1) #the standard rubytree implementation of add together goes here begin self.tsort() rescue tsort::cyclic => exce self.remove!(child) raise exce end homecoming kid end end end
i had modify few other methods too. needs traverse tree or children needed have tsort implemented, or rid of dependence on traversing (for instance, simplified tree::treenode#to_s()
homecoming tree::treenode#name
.)
right now, programme functionally correct. i've done important testing , results working fine: have rescue tsort::cyclic
@ right points in code, , if ever seek add together node causes cycle, node gets removed , can log problem in study deal later (by fixing source data).
problem is, on rubytree of size 75000 or so, number of edges close beingness equal number of vertices minus 1, iteratively running tarjan's algorithm produces algorithmic complexity looks pretty quadratic. tarjan's o(|v| + |e|)
, in case o(2*|v|)
, each time phone call add()
, |v|
increases 1, build graph node node. can't phone call tarjan's @ end, because might need traverse graph, or parts of it, during read-compare-add loop, , traversal effort may hang programme or crash if there in fact cycle. (it goes without saying code single-threaded; if weren't, we'd have huge problem. stands, rely upon fact add()
never returns without raising exception if there's cycle, , if there is cycle, node gets removed in such way clear out cycle before add()
returns.)
but it's slow! it's taking more half hr 1 loop, , programme consists of several other steps take own fair share of time. stands, performing tarjan's eating lion's share of performance, judging results of ruby-perf
. tried switching arrays linked lists in rubytree each
implementation, cutting downwards runtime 1% removing loads of array#concat
calls.
i found this awesome paper the tarjan invented connected components algorithm ruby's tsort
relies upon, , seems incremental cycle detection active area of research. however, level of dialogue in paper on head, , i'm pretty sure lack mathematical background translate paper's findings ruby code. not that, reading remarks section of paper, seems best effort algorithms have rather worrisome worst-case runtimes of own, may not faster current method, depending on specifics of data.
am missing silly here, or best bet set in mental legwork analyze tarjan's paper , seek come ruby implementation of 1 of algorithms? note don't particularly care topological sorting aspect of algorithm; it's side effect of want. if tree not topologically sorted still had guarantee of no cycles, happy.
also worth noting cycles somewhat rare in source data. is, cycles can happen due manual error in info entry process, never happen intentionally, , should reported programme can tell me, can beat on head billyclub entering wrong data. additionally, programme absolutely must maintain on chugging if detects particularly nasty cycle, can't stick head in sand , hope there won't cycles.
what's actual problem?as requested some, here's demonstration can run see issue @ work.
install stable version of rubytree (i utilize mri 1.9.3). compare output of these 2 programs:
exhibit 1: hangs forever 100% cpu usage on main thread after "third time" printedrequire 'tree' = tree::treenode.new('a', nil) b = tree::treenode.new('b', nil) c = tree::treenode.new('c', nil) a.add(b) a.add(c) puts "first time" b.add(c) puts "second time" b.add(a) puts "third time" c.add(b) puts "fourth time" c.add(a) puts "fifth time" puts "done"
exhibit 2: goes way through , prints "done", , result has no cycle note doing stuff within rescue
blocks log cycle(s) happened, , complaining loudly @ human offenders created these cycles.
require 'tree' require 'tsort' module tree class treenode include tsort def tsort_each_node(&block) self.each(&block) end def tsort_each_child(node, &block) node.get_children().each { |child| yield child} end def to_s name end def get_children() homecoming @children end def add(child, at_index = -1) unless kid raise argumenterror, "attempting add together nil node" # handles immediate kid scenario end if self.equal?(child) raise tsort::cyclic, "cycle detected: [#{child.name}, #{child.name}]" end # lazy man's unique test, won't test if children of kid unique in tree too. if @children_hash.include?(child.name) raise "child #{child.name} added!" end if insertion_range.include?(at_index) @children.insert(at_index, child) else raise "attempting insert kid @ non-existent location (#{at_index}) when positions #{insertion_range.min} #{insertion_range.max} exist." end @children_hash[child.name] = kid child.parent = self #cycle detection - raises tsort::cyclic if caused cycle begin self.tsort() rescue tsort::cyclic => exce self.remove!(child) raise exce end homecoming kid end end end = tree::treenode.new('a', nil) b = tree::treenode.new('b', nil) c = tree::treenode.new('c', nil) a.add(b) a.add(c) puts "first time" b.add(c) puts "second time" begin b.add(a) rescue end puts "third time" begin c.add(b) rescue end puts "fourth time" begin c.add(a) rescue end puts "fifth time" puts "done"
the goal, me, develop code functionally equivalent exhibit 2, scales improve larger numbers of vertices (i don't anticipate having more 10^6 vertices, , in case fine taking several minutes ("go cup of coffee") on modern desktop workstation, not hours or longer.)
the plexus gem ruby seems have resolved worst of problems. tried gratr before, wouldn't load because wasn't compatible ruby 1.9.3, plexus fork of fork of gratr works 1.9.3.
my problem was using info construction (rubytree) wasn't designed handle cycles, plexus digraph can maintain working cycles. api designed them in mind.
the solution went pretty simple: basically, graph info construction doesn't hang on cycles, can phone call tarjan's algorithm @ end of graph building routine -- actually, there's nice wrapper acyclic?
method, calls topsort()
under hood, , implements topological sort using tarjan's connected components algorithm, much ruby's stdlib's tsort
. utilize own implementation instead of tsort
, though. unsure why.
unfortunately, i'm stuck challenge of developing implementation of minimum feedback arc set problem (minimum fas problem) np-hard. minimum fas problem required because need remove to the lowest degree intrusive number of arcs in graph create acyclic.
my plan right connected components list plexus, array of arrays; if of second-level arrays contain more 1 element, array describes element cycle, based on definition of connected component. have (using minimum fas or approximation) remove edges and/or vertices create graph acyclic, , iteratively run tarjan's until every scc subarray has length 1.
i'm thinking brute forcefulness best approach solving minimum fas: don't need clever because number of nodes in scc in info set never exceed, say, 5 or 6. exponential on 5 or 6 fine. severely uncertainty i'll have scc set of many hundreds of nodes tens of different cycles in it; extremely pathological worst case not think ever happen. if did, though, runtime quite long.
basically need seek removing powerfulness set of arcs of graph, 1 subset @ time set of subsets sorted ascending subset size, , "guess , check" whether graph still cyclic (tarjan's), add together edges if powerfulness set doesn't prepare cycle.
if number of edges , nodes under 20 or so, guaranteed, won't take important runtime.
removing iterative tarjan's solved complexity problem in happy path (no cycles or 1 trivial cycle), giving me heartburn -- instead of taking 25 minutes build graph, takes 15 seconds.
lesson learned: if programme slow, it's because you're doing lots of unnecessary work. in case, unnecessary work performing tarjan's topological sort upon each add-on of new vertex graph, required because of implementation details of library chose model data.
ruby algorithm cycle digraphs
Comments
Post a Comment