









|
[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
[NMLUG] Efficiently modeling sets and subsets in lattice-like structure
What type of efficiency are you looking for? Computational, storage, speed
of retrieval, ease of programming/queries/maintenance?
As an initial approach I'd just use a lot of recursion. Using a decent ORM
will make the coding easier and clearer, and probably not add too much
overhead (depending on the efficiency of the ORM you're using).
Here's a possible data model using Django's ORM (which I've been working
with a lot recently). Each of these has an implicit primary key named id
which is either unsigned int auto_increment primary key (in mysql) or serial
(postgresql). I'd then check performance, and check out the generated SQL
and start optimizing from there.
class Value(models.Model):
value = models.IntegerField()
class Set(models.Model):
name = models.CharField(maxlength=255)
values = models.ManyToManyField(Value, related_name="sets")
subsets = models.ManyToManyField("self", related_name="parent_sets")
#Add in a function to get all members of a set.
def get_members(self):
ret_values = [val.value for val in self.values.all()]
for subset in self.subsets.all():
ret_values.extend(subset.get_members())
return ret_values
def get_all_parent_sets(self):
ret_values = self.parent_sets.all()
for set in self.parent_sets.all():
ret_values.extend(set.get_all_parent_sets())
return ret_values
So to define the data in your example, the code could be something like:
set = Set()
set.name = "X"
for v in [7,11,15]:
val = Value.objects.get_or_create(value=v)
val.save()
set.values.add(val)
for s in ["Y"]:
newset = Set.objects.get_or_create(name=s)
newset.save()
set.subsets.add(newset)
set.save()
To do the things you wanted:
% Add or remove members/subsets from a set and have the changes
"bubble up" the lattice efficiently.
To remove the value 10 from set "Y" and have it bubble up:
set = Set.objects.get(name="Y")
set.values.remove(Value.objects.get(value=10))
set.save()
% Find all members of a set X, efficiently traversing subsets. In
other words, find all nodes/leaves of the subtree rooted at X
For this, just using recursion would work (using the get_members member
function of the Set class):
values = Set.objects.get(name="X").get_members()
% For a given set X, find all sets that contain X as a subset,
directly or indirectly (eg, if Y contains X, and Z contains Y, return
both Y and Z). In other words, find all the nodes that have X as a
sub-node.
Same thing, recursion using the get_all_parents member function:
all_parents = Set.objects.get(name="X").get_all_parents()
You could also do something similar to get all sets that either directly or
indirectly contain a specific value in much the same manner.
This approach is fairly naive, and is not space/memory efficient because
you're using a model for values, but you get the advantage of DB indexing.
There's probably a better approach, but this could give you something to
start with before the "correct" way to do it hits you in the middle of the
night.
I haven't actually tested this code, so it might not work as written, but
it's pretty close.
Hope it gives you some ideas. Also, using select_related in the queries
would speed it up a little since it does a "deep" query to select all
related objects at the same time instead of when they're accessed. So it
will do 1 large query instead of several small queries.
Let me know if you would like some benchmarks or anything, and don't feel
like setting up your own dev environment just to test feasibility.
--
Matthew Bowie
Programmer/IT Consultant
niosop at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.b9.com/pipermail/nmlug/attachments/20070530/655059fe/attachment.htm
|
|