Just curious but why? Wouldn't unique keys within partitions mean part-key + par...

da_chicken · on Jan 20, 2018

If I'm understanding everything right, the uniqueness is only checked within each partition because only the partitions can have unique indexes. If you're partitioning by month and got an id field that needs to be unique to the table, you could conceivably have the same id show up in different months. Just because I partition by months doesn't mean I don't sometimes need the whole table, too. I might usually only need monthly data, but sometimes I might need a full history. Now if I've got a duplicate I'm fucked and I don't even know it.

gshulegaard · on Jan 22, 2018

This is a good point, but your partition space implicitly has a unique compound key (part_unique_id, part_key). In almost any case I can think of compound unique keys are probably sufficient, but if you really must have a single UUID, then you could combine these two keys to create a unique identifier.

Say new_key = <part_unique-id>-<part_key>. Now new_key is guaranteed unique across the partition space. You could consider hashing as well...although I don't recommend this since hashes don't have collision guarantees (even if the chances of collisons are small for most modern algorithms).

da_chicken · on Jan 22, 2018

I don't see how a compound key does anything here.

Say my table is a Student table: StudentId, Building, Name, Birthdate, Gender, Grade, Status. I want to partition by Status so that Active students are together, but StudentId must be unique across the entire district.

gshulegaard · on Jan 22, 2018

I am not sure this case really fits what I thought your point was, but a shared sequence would help here:

    CREATE SEQUENCE student_id_seq
        START WITH 1
        INCREMENT BY 1
        NO MINVALUE
        NO MAXVALUE
        CACHE 1;

And anywhere you create a student you would set default to be next value in the sequence:

     ALTER COLUMN id SET DEFAULT nextval('student_id_seq'::regclass);

Although, in this example I am pretty sure a valid limitation of the model could be that students must be created with "Active" status...which then confuses me slightly since you wouldn't really partition by status since updating the status of a student is an action so I would move the user record at that time. Which is no longer partitioning per se.

But this is one (probably naiive) way to handle this case.

da_chicken · on Jan 23, 2018

A sequence isn't a unique key. It's not data integrity. It's just a sane default. There's nothing stopping an application from inserting a specific value, and so, without a unique constraint, there's nothing stopping an application from inserting a duplicate value. There's a world of difference between a sane default and a enforced constraint. Half the point of an RDBMS is that the database cannot store inconsistent or invalid data. You mark a column as unique and you don't have to worry about it again. You can't store a duplicate value unless the database is corrupt.

Yes, if the applications using the database have no bugs and always work as expected, then you won't have any duplicates. However, that line of reasoning leads to just 100% trusting everything the application does regardless of the design of the data model. That's exactly how data stores used to work before RDBMSs, and it's exactly why RDBMSs came about: applications can't be trusted to manipulate data consistently to a known set of rules. Somebody will mess it up somewhere, so it's important to enforce rules to leave the database in a manner that other applications (or other parts of the same application) will find comprehensible.

gshulegaard · on Jan 24, 2018

> It's not data integrity.

This is true. And while you could claim it is a "non-starter" for using partitions I would argue that app logic guarantees of how that column is used is sufficient for many cases to actually use in its current state.

You have good points, and I have no disagreement about data integrity constraints or whether or not partition-wide uniqueness guarantees are a good feature.

It would seem you and I simply disagree on whether or not partitions without uniqueness guarantees are unusable for most use cases. I believe many partitions use cases don't need uniqueness guarantees (such as high volume, low/no update work loads). And for quite a few, if definitely not all, cases where uniqueness is desirable it could be satisfactorily handled in app logic.

But again, key agreement is partition-wide uniqueness guarantees are a good feature.

philliphaydon · on Jan 21, 2018

If you're partitioning by month how would the month end up in 2 partitions? I mean how would march and up in april or something!???

e12e · on Jan 22, 2018

Think: TransactionId, ItemId, BuyerID, Count, date (1, 1, 1, 2, 1/1/2018; 1, 2, 1, 2, 1/2/2018)

Where TransactionId should be globally unique?

[ed: see https://news.ycombinator.com/item?id=16196096 for a much better/comprehensive example of the same idea]

gshulegaard · on Jan 22, 2018

See my response above :)

e12e · on Jan 23, 2018

I agree that you would normally end up with a compound unique key, but it might have some benefits to know that your transaction I'd was globally unique. Not jus per month/weekday - but also on geographic region, stores (eg: avoid need to rewrite/legacy support on store merges/splits, ditto for regions etc).