algorithm - What is the fastest way to join multiple subsets that have similar elements? -
i have list 500+ thousand subsets each having 1 500 values (integers). have like:
{1, 2, 3 } {2, 3} {4, 5} {3, 6, 7} {7, 9} {8, 4} {10, 11}
after running code get:
{1, 2, 3, 6, 7, 9} {4, 5, 8} {10, 11}
i wrote simple code [here] compares each subset each subset, if intersect joined together, else not. ok on small scale, big amount of data takes forever.
please, advise improvements?
p.s. not strong in maths or logics, big o notation greek me. sorry.
you're trying find connected components in graph, each of input sets representing set of nodes that's connected. here's simple implementation:
sets = [{1, 2, 3 },{2, 3},{4, 5},{3, 6, 7},{7, 9},{8, 4},{10, 11}] allelts = set.union(*sets) components = {x: {x} x in allelts} component = {x: x x in allelts} s in sets: comp = sorted({component[x] x in s}) mergeto = comp[0] mergefrom in comp[1:]: components[mergeto] |= components[mergefrom] x in components[mergefrom]: component[x] = mergeto del components[mergefrom]
that results in components having list of components (keyed minimum element), , component storing components each element:
>>> print(components) {1: {1, 2, 3, 6, 7, 9}, 4: {8, 4, 5}, 10: {10, 11}} >>> print(component) {1: 1, 2: 1, 3: 1, 4: 4, 5: 4, 6: 1, 7: 1, 8: 4, 9: 1, 10: 10, 11: 10} >>>
Comments
Post a Comment