Do you develop on GitHub? You can keep using GitHub but automatically sync your GitHub releases to SourceForge quickly and easily with this tool so your projects have a backup location, and get your project in front of SourceForge's nearly 20 million monthly users. It takes less than a minute. Get new users downloading your project releases today!
vvizard writes "After almost a full year of development since PostgreSQL v7.1 was released, the PostgreSQL Global Development Group is proud to announce the availability of their latest development milestone ... PostgreSQL v7.2, another step forward for the project."
This discussion has been archived.
No new comments can be posted.
This is a huge step forward in making PostgreSQL ready for deployment in the enterprise. Eliminating the locking vacuum in favor of a separate statistics gathering process is clearly the best part of this release.
The only major hurdle left is replication built into the server.
I should probably clarify that - the full locking vacuum was separated out into two parts one which analyzes statistics and doesn't lock the tables and another which does what the old vacuum did by reordering data blocks to shrink the size on disk.... the bonus is that before you might run vacuum once a week or so because of the impact it has to a production system by doing a full lock on each table it vacuumed, now you can run it much more frequently as all it consumes is cpu time. Shrinking size on disk is nice, but it's the statistics that help the query planner turn SQL into faster queries.
Always remember that table statistics are used for approximate best guesses for inputs to the query optimizer. It is not uncommon or unheard of to actually see somes types of queries run slower after table statistics have been updated. I've seen this on Oracle, Sybase and SQL Server. I doubt that this is an issue unique to those RDBMS since the conceptual implementations and basis for algorithms tend to all be more or less the same.
That is definately a risk. It is often possible to harm overall system performance by upgrading an RDBMS that includes optimizer improvements. Any changes to an optimizer will change execution plans. Hopefully most of them get better, but a few get worse, often dramatically worse. Finding the ones that get worse and tuning them is an important activity. Bad SQL plans are often the biggest impact tuning activity, so it is very important to understand what will happen to your specific application before you make changes that affect how your SQL-statements are implemented.
This is one area in which Oracle shows its power over the open source databases. (It's also a big oportunity because Oracle can be improved on). Oracle can actually tap into continuous statistics gathering on a per SQL level by using oracle's v$sqlarea dictionary view. If you need high-powered scrutiny on a particular activity, you can trace the session to logs and see the row statistics at every step of the exectution plan. Oracle has more optimizer hints, and has a facility to "pin" an execution plan, so that it won't be reevaluated if optimizer behavior changes. Oracle is working toward server-side SQL tuning, where you can ID bad SQL's and "intercept" them at runtime by adding hints on the server side. That will be an absolutely huge feature, since often SQL hits your system that you can't directly control but can predict.
First, I'd like to say thanks for the information on Oracle. That's a very interesting route that I'd not considered before. Very interesting indeed.
As for the optimizer issues associated with updated statistics, I guess I would like to expand on this slightly. Most people don't realize that many optimizers have hard and arbitrary thresholds for determining the best possible algorithmic data path. This is why it is very important to understand not only your data distribution but the nature of the delta of your data distribution. Also key is is the understanding of how multiple indexes will effect the optimizer's (especially if multiple index types are supported by the RDBMS in question) output. Common causes for negative query performance is slight distribution statistical changes which hit an arbitrary threshold causing another data path to be recommended by the optimizer. Thusly the significance of Oracle being able to pin query plans can be profound.
Because of these issues, my self and several other DBA's do not advocate blindly updating statistics simply for the sake of doing so. First of all, if your data distribution tends not to vary, this can result in nothing more than a waste of time. Furthermore, if your data distribution tends to vary only slightly, you'd better understand these trends as it's very possible timely and updated statistics may work against you depending on where you were within your trend's cycle.
While I knew about the pinned query plans on Oracle, I didn't know about the planned server tuning hints. Abilities (assuming I understood you correctly) like this can go a long way toward addressing the issues I've raised about query optimizers.
I should probably clarify that - the full locking vacuum was separated out into two parts one which analyzes statistics and doesn't lock the tables and another which does what the old vacuum did by reordering data blocks to shrink the size on disk....
This is incorrect (I believe it describes the situation after 7.1: the VACUUM ANALYZE command only needed to lock the table exclusively when VACUUMing, only a read lock was needed for ANALYZE).
In 7.2, the ANALYZE command can now be used separately, as you say. However, there are other (more important) improvements: ANALYZE only takes a look at a statistical sampling of the rows in the table. This means that collecting statistics on even enormous tables is very fast. Furthermore, VACUUM has been made "lazy" by default: this means that it doesn't attempt to reclaim space as aggressively as before, but it no longer requires an exclusive lock on the database (instead, it cooperates with other DB clients). The old behavior is available as "VACUUM FULL", and it is suggested whenever you need to reclaim a lot of diskspace (e.g. you delete hundreds of thousands of rows of data and need the space).
you might run vacuum once a week or so
It was (and is) suggested that you run VACUUM once per day.
it's the statistics that help the query planner turn SQL into faster queries.
As far as I know, you only really need to update your planner stats when you change the statistical distribution of your data. Of course, running ANALYZE's reasonably often won't do any harm, and is a relatively cheap operation (performance-wise).
I'm not disagreeing with anything you said, in fact you all but reiterated everything I said.
The 7.1 vacuum analyze required table locks. Doesn't matter which phase of it required locks - it required exclusive locks because it vacuumed. By breaking that into a separate commands the need for downtime is reduced drastically (down to the example which you point out - deleting thousands of rows at a time).
I know that you're recommended to run vacuum once per day, but I found that on a large database running on a fast server a daily vacuum took nearly 30 minutes to complete... that's 30 minutes of sequentially locked tables. Can't afford to do that every day - moving it to once a week may have degraded performance but it reduced the downtime window from 30 minutes per day to 1 hour per week.
I'm just happy that I don't have to bring a production server to its knees once a week (or for that matter once a day) just to do some table maintenance.
I'm not disagreeing with anything you said, in fact you all but reiterated everything I said.
No I didn't, read my post again.
The 7.1 vacuum analyze required table locks.
PostgreSQL has lots of different types of locks of varying granularities. Saying "table locks" doesn't mean a whole lot.
Doesn't matter which phase of it required locks
It does though -- in 7.1, splitting vacuum and analyze internally reduced the time that an exclusive lock needs to be held.
By breaking that into a separate commands the need for downtime is reduced drastically
This is where you're wrong. The reduction in downtime has nothing to do with allowing ANALYZE to be executed separately. It is entirely the result of the new vacuum code (which is "lazy", unlike a VACUUM FULL -- which does a 7.1-style VACUUM). In 7.2, running VACUUM (with or without ANALYZE) is fast, and doesn't require an exclusive lock -- so your database can continue serving clients while a VACUUM is executing. Whether you choose to run ANALYZE at the same time or separately is really irrelevant.
I'm just happy that I don't have to bring a production server to its knees once a week (or for that matter once a day) just to do some table maintenance.
I'll say that again, you reiterated everything I was thinking that I implied in my post... but true not everything I said.
This is where you're wrong. The reduction in downtime has nothing to do with allowing ANALYZE to be executed separately. It is entirely the result of the new vacuum code (which is "lazy", unlike a VACUUM FULL -- which does a 7.1-style VACUUM). In 7.2, running VACUUM (with or without ANALYZE) is fast, and doesn't require an exclusive lock -- so your database can continue serving clients while a VACUUM is executing. Whether you choose to run ANALYZE at the same time or separately is really irrelevant.
Yes, I misspoke there and left out lots of detail - when I run a vacuum I always run it with anaylyze since there's no way to analyze in 7.1 without vacuuming (at least that I'm aware of). Why would I want to run vacuum followed by vacuum analyze only to have the tables exclusively locked twice? I want to do both, clean up unused rows, and update stats.. so I always run vacuum analyze. What's worse is that the 7.1 vacuum can deadlock with other processes because it uses exclusive locks.
And yes, I know that in 7.1.x they changed the locking behavior so the vacuum analyze so it did not hold an exclusive lock during the portion where it was anlyzing the table. But that doesn't change the fact that it did hold an exclusive lock while vacuuming - which it had to do because the commands were joined together.
Effectively there were TWO good things that happened. Analyze was broken into a separate command - that alone would have been an improvement since it would have allowed for vacuums to be done distinctly from analyzes. Vacuum becoming lazy is a second good thing - and yes a very good thing it is. The short of all this is that now you can run vacuums and analyzes separately all day long if you want.
I'm not trying to get tha last word;-) really! I think we both can agree that 7.2 is a big leap forward.
It's nice to see that they took a year, actually did a great deal of work making it better, and it's gone from 7.1 to 7.2. I always thought it was kinda lame when someone changes 3 characters in output and jumps from 1.3 to 3.0. Damn I need sleep.
Interesting that their numbering is keeping pace with Redhat's numbering schema. Probably just a coincedence, but iirc Redhat is selling a version of PostGreSQL as the Redhat Database or somesuch.
It would be pretty funny if they started mimicking Microsoft instead: PostGreSQLXP, PostGreSQL98, PostgreSQL2000AdvancedServer, etc. But I digress.
These guys are here to kick ass and chew bubble gum -- but they're all out of gum.
On a serious note, I greatly welcome the 4 billion transaction limit being raised. Sure, it's 126 per second over the period of a year -- but it's one step closer to being the database for a major system.
Next step, master-master replication and clustering. Only wish I knew enough about either topic to help out. I'm doing lots of research on it though, maybe someday.
I've recently set up a master-master replication environnement on Oracle 9i and I did some research to check if it was possible with postgres.
In fact there are many solutions available (check techdocs.postgresql.org [postgresql.org] for a list...)
The most advanced guys on the subject seem to be the swiss engineering school in Zürich. Here [inf.ethz.ch] is a list of their publications.
They seem to have developped a replication scheme (Postgres-R) where they have better than linear performance improvement when they add new masters...Quite impressive
On a serious note, I greatly welcome the 4 billion transaction limit being raised. Sure, it's 126 per second over the period of a year -- but it's one step closer to being the database for a major system.
Jeez. I had no idea. If our MySQL server had this limit:
Uptime: 10277498 Threads: 7 Questions: 3678405287 Slow queries: 1584 Opens: 2299222 Flush tables: 1 Open tables: 256 Queries per second avg: 357.909
I'd be receiving a phone call in about a month from my client wondering why the site stopped working.;)
And just to make this look less like flamebait: I use and appreciate both MySQL and PostgreSQL. Do get a life, kids.
On a serious note, I greatly welcome the 4 billion transaction limit being raised. Sure, it's 126 per second over the period of a year -- but it's one step closer to being the database for a major system.
Jeez. I had no idea. If our MySQL server had this limit:
Uptime: 10277498 Threads: 7 Questions: 3678405287 Slow queries: 1584 Opens: 2299222 Flush tables:
1 Open tables: 256 Queries per second avg: 357.909
I'd be receiving a phone call in about a month from my client wondering why the site stopped working.;)
As others have pointed out, there's a difference
between "Queries" and "Transactions". Now, people
have often facetiously accused MySQL people from
not knowing what a Transaction is, but here we
have some solid proof.:-)
Seriously, the previous PostgreSQL limitation
probably did have some serious ramifications for
bigger installations, so it's a good thing it's
history now. When I bring PostgreSQL up in polite
public these days, Oracle jockeys now feel the
need to critique it more seriously than the "don't make me laugh" attitude of even a couple of years
ago. Not because PostgreSQL is that likely to
sink a real high-end RDBMS, but it will be an increasingly plausible contender for users who
don't really need or use all of the cool (but expensive) features of Oracle or DB2.
4 billion transactions include updates, deletes and inserts. Not that its great, but any system thats doing that many changes in a second probably has enough money to spend on good hardware and needs a DB with excellent commercial support -- since the latter doesn't exist, Oracle or DB2 will due;)
Selects aren't counted of course.
Yes, disk reclaiming locked whole tables. Mind you, until recently MySQL locked tables for every statement (including selects??), so did SyBase (but not for selects I don't think).
Locking a table for a minute once a day isn't nearly as bad as 150 times a second -- but of course Postgres does neither now.
VACUUM: Vacuuming no longer locks tables, thus allowing normal user access during the vacuum. A new "VACUUM FULL" command does old-style vacuum by locking the table and shrinking the on-disk copy of the table.
Transactions: There is no longer a problem with installations that exceed four billion transactions.
OID's: OID's are now optional. Users can now create tables without OID's for cases where OID usage is excessive.
Optimizer: The system now computes histogram column statistics during "ANALYZE", allowing much better optimizer choices.
Security: A new MD5 encryption option allows more secure storage and transfer of passwords. A new Unix-domain socket authentication option is available on Linux and BSD systems.
Statistics: Administrators can use the new table access statistics module to get fine-grained information about table and index usage.
Internationalization: Program and library messages can now be displayed in several languages.
.. with many many more bug fixes, enhancements and performance related changes...
One issue that is not mentioned in the release highlights is the marked improvement that is now available for SMP boxes. In some cases throughput has been increased by more than a factor of 2.
We run PostgreSQL on a dual-processor Linux box to feed our OpenBSD web servers. We got a HUGE speed gain from the OpenBSD -> Linux change (even when we ran it on a slower machine while testing it), and any SMP gains will be helpful.
When we did OpenBSD we had to be VERY careful not to do more queries than necessary (including some complicated joins and then having PHP parse the results). With Linux as the database server, I feel that I can throw hardware at it (including moving to Solaris if need be) and optimize the queries a bit less to abstract the programming.
SMP improvement is important, as the next step up for us is a Quad-Xeon processor, then Sun Hardware. (PostgreSQL seemed to run best on Linux and Solaris from the old website)...
It's such a shame that they never figured out the PostgreSQL support model. I would have happily paid for some support, but it always seemed easier to get the OpenBSD port or the Redhat RPM than pay for their CDs. They never included much beyond installation support. I knew how to install it, having some support (not the mailling list) for some of my optimization questions would have saved days and been worth a support contract.
I was referring to Great Bridge? One of the core teams tried to start a PostgreSQL business and failled.
I know about redhatdb. For $2500, I wasn't really sure what I got. It may become to solution in the future however.
Our Red Hat 6.2 boxes would get rooted if we did the install before lunch and came back to fix them after lunch.
Now they sit inside a nice safe OpenBSD network. Red Hat Database looks promising, it's just sad that Redhat stole their business because they couldn't get their shit together.
I couldn't figure out what your support policy was. There were no support contracts or anything else. It seemed like our only option was to pay $500 for installation help. It was rediculous.
Look, I'm a small guy, but I was having pain in the ass optimization problems getting OpenBSD and PostgreSQL to run at a reasonable speed. However, I could get it running. I am 100% certain that if I called support with the $500 CD, I'd be told that I was SOL.
I can't believe there wasn't a migration program. You're right, that was the biggest mistake. A better solution would be if you could cater to dot-coms that did a 2-3 year Oracle license that was coming to the end and needed cash. They could choose to pay for more Oracle or choose to pay your conversion fee. That would get you somewhere.
I can sympathize with management not knowing how to make things fly. However, if you won't charge for software updates (can't on BSD code without making it non-BSD) then worrying primarily about the next software release is silly.
Such a shame. PostgreSQL is a great product, we power all our clients on it. I would have been more comfortable with some support arrangement, but nobody seemed terribly interested in selling one to me. We've gone so long without one, I would only need one as an insurance policy of sorts, not really for the kinds of help I would like.
Tis a shame, better (and bound) documentation and other niceties would have been nice.
Well, look on the bright side, a good job at a company with hot tech is a good resume line as long as the tech stays hot. I leveraged a summer job at Citrix into the training to get my MCSE (this was 4 years ago) and numerous jobs as an NT Admin whereever a Citrix server was involved.
I really wish there was an embedable version of PostgreSQL... It's a very good database, but it's sometimes a real pain to write a program that ties together a SQL database with anything else, unless it's a local-use only program. I know MySQL added this feature in 4.x (but their transaction support is too new, IMO).
I've used ecpg (ESQL/C, similar to Oracle's Pro*C) since I started using postgresql years ago, and have had no problems at all accessing the data from C or C++ code.
If you're using java then this is even more of a no-brainer - just used the JDBC binding.
Our of curiosity, is there some reason why you'd want PostgreSQL specifically for this task?
There are a wide variety of embedded database toolkits that implement RDBMS functionality without the requirement of running a server product.
The eldest example would likely be the classic DBM file, with is descendents NDBM and GDBM, though these are fairly simplictic.
Your next step up would likely be BDB, which, like DBM, uses its own API, but provides a much more richer set of functionality. The concurrent access, transaction, and recovery options are exceptional.
And finally, if you're looking specifically for a SQL interface, the SQLite package provides a fairly complete SQL92 implementation in its toolkit, including automic commit and rollback.
It also has a command-line "monitor" much like the pgsql or mysql commands for administration.
I'm not necessarily dismissing the concept of in-process access to PostgreSQL tables, but given the choices available already available for embedded databases, and the unique requirements in writing one (e.g. locking without a controlling process), I don't quite what the motivation would be.
I only have one real gripe about PostgreSQL- I hate the upgrade path.
Having to dump the database to disk and re-import is a bad thing IMO. Having to add a switch to keep integrety constraints is a very odd thing for a database (shouldn't the default be to *keep* integrety constraints?)
A separate program to preserve LOBs I can rationalize (it's a lot of generally unneeded overhead since few people use LOBs).
It would really be nice if someone would write some wrapper programs to check for foreign keys and LOBs, then wrap the pg_dumpall & pg_dump commands with the appropriate options into one set of programs.
. Having to add a switch to keep integrety constraints is a very odd thing for a database (shouldn't the default be to *keep* integrety constraints?)
If you are certain that there are no constraint violations, because you've just exported from one database, then disabling them would make importing into a new database much faster. It's common practice, you just re-enable them before you make the new database available for transactions.
It sure would be nice to be able to do major hackery with the data formats, and have an easy in-place upgrade; unfortunately, this comes with two big costs:
It takes a lot of work to do this, which will seem pointless to many when "dump/re-import" is already there and works;
More importantly, debugging this, across umpteen platforms, possible variations on how folks compiled it, and moves between minor version numbers that may differ more or less than you'd expect is really a lot of work.
The unavailability of a "Ext2-to-ReiserFS" translator, or Ext2-to-JFS or Ext2-to-XFS occurs for much the same reason, albeit with the further challenge that the "upgrade" would somewhat more resemble a "MySQL-format-to-PostgreSQL-format" conversion:-).
Eek, overload. Client = application that connects to database. Client = person that is paying you to do stuff. Hmm..
If you're talking about Windows applications that talk to the database, PostgreSQL already has what you need. Just run PostgreSQL on the Unix box and get the ODBC driver (yes, there is one) for the Windows client machines, and you're all set. It works.
If you're talking about Windows people, well, think of this as an infiltration opportunity. Just sell 'em a dedicated database server machine, and they don't need to know what OS it's running.:-)
I *just* finished my FreeBSD 4.5 upgrade on my database server last weekend. It was a big effort, as I've got multiple databases with tables ranging from a few to 7 million rows with referential integrity and average at least 1 insert every ten seconds every day.
PostgreSQL is an ACID compliant database. MySQL is not (unless that has changed recently--if so please let me know).
ACID (an acronymn for Atomicity Consistency Isolation Durability) is a 'keyword' that business professionals generally look for when evaluating databases. Frankly, non-ACID databases aren't taken very seriously, even if they are used by the likes of Yahoo and Slashdot (like MySQL is).
Here is a quick description of what it means to be ACID [webtechniques.com] compliant:
1. Atomicity is an all-or-none proposition. Suppose you define a transaction that contains an UPDATE, an INSERT, and a DELETE statement. With atomicity, these statements are treated as a single unit, and thanks to consistency (the C in ACID) there are only two possible outcomes: either they all change the database or none of them do. This is important in situations like bank transactions where transferring money between accounts could result in disaster if the server were to go down after a DELETE statement but before the corresponding INSERT statement.
2. Consistency guarantees that a transaction never leaves your database in a half-finished state. If one part of the transaction fails, all of the pending changes are rolled back, leaving the database as it was before you initiated the transaction. For instance, when you delete a customer record, you should also delete all of that customer's records from associated tables (such as invoices and line items). A properly configured database wouldn't let you delete the customer record, if that meant leaving its invoices, and other associated records stranded.
3. Isolation keeps transactions separated from each other until they're finished. Transaction isolation is generally configurable in a variety of modes. For example, in one mode, a transaction blocks until the other transaction finishes. In a different mode, a transaction sees obsolete data (from the state the database was in before the previous transaction started). Suppose a user deletes a customer, and before the customer's invoices are deleted, a second user updates one of those invoices. In a blocking transaction scenario, the second user would have to wait for the first user's deletions to complete before issuing the update. The second user would then find out that the customer had been deleted, which is much better than losing changes without knowing about it.
4. Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination. Hence, even if the database server is unplugged in the middle of a transaction, it will return to a consistent state when it's restarted. The database handles this by storing uncommitted transactions in a transaction log. By virtue of consistency (explained above), a partially completed transaction won't be written to the database in the event of an abnormal termination. However, when the database is restarted after such a termination, it examines the transaction log for completed transactions that had not been committed, and applies them.
It is difficult to trust mission critical data to a database that does not guarantee that it will complete not screw up (short of a bug, of course), this such compliance--even when it is more political than technical--is very important.
All true. In fact, MySQL is neither relational nor a DBMS by any stretch of the imagination, if you really understand what these terms mean.
People who try to argue otherwise (including the developers of MySQL), need to go back to school to relearn their math and CS fundamentals. There's a bit of discussion of this at http://searchdatabase.techtarget.com/tip/1,289483, sid13_gci788645,00.html?FromTaxonomy=%2Fpr%2F28487 2
Here's an example of why an ACID database is useful that hits close to all of our hearts - Slashdot moderation:
You may have noticed that if several people try to whack a troll at the same time, they all expend one moderator point, even if only a fraction of those points were required to push that troll down into the dreaded depths of -1.
If an ACID complient database were used, and the two steps of whacking the troll, and deducting the moderator points were placed in the same transaction (with a check constraint on the score of the posts to prevent them from dropping below -1) then the later moderators who tried to whack the troll would not have their points deducted, as the transaction would rollback when the constraint on the score of the post was exceeded.
Alas, mysql is not ACID complient, and so this sensless waste of moderator points continues to this day...
Actually, this is not the fault of the database. If you mod a post that is already +5 up or mod a -1 post down you will lose a mod point and the score will go unchanged (the moderation total values will increase though)
You can use transactions or locking with MySQL to keep consistancy on this particular issue as well; however, under either database, the use of a transaction to record a moderation is fairly frivilous and probably more of a waste of CPU time than moderation point -- which is why slash doesn't implement it.
For clarification, you can't use transactions on MyISAM tables [yet] but it's not like transactions in MySQL don't exist at all. Just use BDB, InnoDB, or Gemini table types and you get transaction support.
Anyway, I don't care which DB you prefer or use or promote or whatever. I have run into programming problems with both Postgres and MySQL alike.
So, even though you are quite incorrect (which your probably already knew), at least you got me to respond to your troll.:)
You Said: Actually, this is not the fault of the database. If you mod a post that is already +5 up or mod a -1 post down you will lose
a mod point and the score will go unchanged (the moderation total values will increase though)
I Reply:
I have a hard time believing that this behavior
started out as a feature. I find it much more likely that it was initially a bug. This bug, being found useful was then elevated to the status of a "feature". You are correct that it is not the fault of the database, but transaction and constraint support at the database level would have made it easy to prevent this problem from ever cropping up in the first place.
You Said: the use of a transaction to record a moderation is fairly frivilous and probably more of a waste of CPU time than moderation point
I Reply:
For a system which recieves as much activity as Slashdot, and with a constant stream of friendly trolls looking for any crack in the system that they can use to share the sight of their favorite gaping asshole with the unwilling members of the rest of the population, if I were coding it, I would insist on inserting checks of basic constraints at different levels of the system. The database layer is your last line of defense against abuse of the system.
Secondly, these double checks are useful for finding errors in other levels of the system. Remember the problems that used to crop up from time to time with comments being moderated to -2 or to 6? If the value of the moderations was constrained in the database, not only would users not see this problem, but an error log generated (for the admins only) when a transaction is rolled back in a situaition where it is not expected would have helped isolate the fault very quickly.
The database level checks would also help against rogue activity of people in positions of (limited) trust. Worried about an editor editing one of their accounts to give themselves a huge number of modpoints? Cap the level in the database at 5; it would make it impossible for this nefarious subterfuge to take place.
As for the speed issue; if you are willing to sacrefice verification of correct operation for a small increase in speed, you have severly underspecified your hardware requirements.
Finally, I would like to include a small SQL fragment, showing some of the checks that I would feel are absolutely necessary for a web based discussion system that people are trying to subvert:
--First we create a table for a couple of users
CREATE TABLE "users" (
"uid" integer serial,
"mod_points" integer default 0,
"name" text not null,
CONSTRAINT "user_mod_const" CHECK (((mod_points > -1) AND (mod_points <
6)))
);
--now a table for some posts
CREATE TABLE "posts" (
"date" timestamp with time zone DEFAULT 'now()',
"pid" serial,
"parent" int4
"uid" int4,
"mod" integer DEFAULT 1,
"body" text not null,
"section" integer,
CONSTRAINT "mod_const" CHECK (((mod > -2) AND (mod < 6))),
CONSTRAINT "user_key" FOREIGN KEY (uid) REFERENCES users(uid)
on delete cascade
on update cascade
); -- the constraint to ensure parent is equal to zero, or another pid in the posts table is left to the reader.
--And now for a function to access them. (Remember - direct SQL is icky; run things through functions to ensure a consistant interface)
CREATE FUNCTION "mod_down" (integer,integer) RETURNS integer AS '
begin;
update users set mod_points=mod_points-1 where uid=($1);
update posts set mod=mod-1 where pid=($2);
commit;
select mod from posts where pid=($2);
' LANGUAGE 'sql';
As you can see, this nicely serves as a check to ensure the restrictions I mentioned above. With it being so trivial to add the checks, I can't see any reason to not take this extra step to eliminate nasty surprises.
Direct SQL is icky? Not if you grok functional languages.
I like SQL - I have no problems understanding SQL. I meant that statement in the sense that it feels more clean (to me) to call a stored procedures in the database from the outside, rather than sending the SQL statements to the database over and over.
It also serves as a handly layer of abstraction from your code, in the event that you want to make drastic changes to the structure of the underlying database.
Maybe I've just seen too much code with SQL commands scattered all over the place, doing the same thing in different fashions at different times, and using strange DB specific constructs at random.
I would prefer to keep the DB specific SQL extensions (if they must be used at all) in stored procedures in the database, and present a consistant interface of stored procedures to the external program. That way, you can support different databases, using the special features they each offer, by providing different database initialization files, and not require any changes to the main program.
I completely agree. Pretty much my entire job revolves around sticking Intranet front ends on databases of various types, in Cold Fusion no less, and when I look back over mode I code, I see some of the dirtiest, hacky solutions I can imagine. I can't use any stored procs because they don't allow me access to the database, so I'm stuck with hacking stuff up on my end.
I'd like to rant for a moment: The MS Jet SQl engine sucks, and is the bane of me existence. Thank you.
You might as well make it a perl subroutine, or hell, just use the new perl procedural language inside MySQL itself.
PostgreSQL has a much better [postgresql.org] perl based stored procedure language than mySQL [tangent.org] has. Unlike the mySQL hack, the Pg implementation is a complete procedure language -- you can even use the normal DBI syntax to run queries against the database.
I find it a bit odd to claim non-ACID aren't taken seriously while in the same sentence mentioning that Yahoo uses a non-ACID database system. Obviously somebody there took it seriously enough to roll out...
However, Yahoo is not a typical business. As the Yahoo article [mysql.com] says, "Finance managed its database needs through homegrown flat files and Berkeley DB databases. But those solutions proved to be inflexible and not scalable enough for their needs..."
Homebrew flat files will be wooped by any decent DB server, especially a blazing fast one like MySQL, any day of the week. Yahoo also tends to embrace open source and new technologies more readily. Most of their servers, for example, run FreeBSD and have since the beginning.
This is a slightly bad example considering that FreeBSD is unquestionably a product that you can trust your mission critical data to, but it illustrates to a degree the type of company that Yahoo is.
I, myself, would trust MySQL with anything that didn't matter to someone else. In the end--it's politics, and "nobody ever got fired for using an ACID compliant database.";-)
MySQL is ACID compliant if you use the InnoDB table type. InnoDB comes with the most recent version of MySQL. It supports transactions and rollback, row level locking, and referential integrity. Slashdot uses InnoDB.
Ok, one question. Can you do online backups yet with InnoDB or do you still have to shutdown the system?You can buy an incremental hot backup module from InnoDB for EUR250/year, or use MySQL replication to keep a backup constantly up to date (you can then stop replication and take down the slave DB to do a binary backup as required).
MySQL is most definitely capable of supporting ACID functionality.
One of the nice features of MySQL is the capability of having pluggable persistence managers. An example of that is the default, MyISAM, which you are correct in saying does not support ACID. But with the release of MySQL-Max [mysql.com], which happened awhileago(tm), and MySQL v4 out of the box, support for 3 additional backends was added, BerkelyDB [sleepycat.com], Gemini, and InnoDB [innodb.com], all of which have complete ACID support. InnoDB also supports row level locking and even an initial implementation of foreign keys.
InnoDb is is in use here at Slashdot as well as a good deal of other sites [innodb.com] demanding high-transaction throughput with full ACID support.
With the addition of foreign keys and stored procedures functionality, all of which are on the slate for the 4.x series, the reasons not to use MySQL are lessening every day.
Side note: Yeah, I know Gemini is the red-haired stepchild of the MySQL world. It's still a decent table manager.
Yes, you can move to InnoDB tables, but your performance goes South quickly, thus taking away one of the few reasons to use MySQL.
One of my clients had a Linux Apache/PHP/MySQL box that was being heavily pounded by 250+ web-based instant messaging users. (I know... using a database to handle instant messages:-( ). Anyway, I, in my simple logic, thought
1. The message tables are constantly subject to INSERT, SELECT, and DELETE queries.
2. MyISAM tables require a full lock for every INSERT or DELETE
3. InnoDB tables have row-level locking
4. Thus, I will install MySQL-Max, and use InnoDB tables for the messaging data, thus relieving PHP of the continuous connect...wait...reconnect scenario
Well, I read every note about configuring MySQL for performance, and I made the change to InnoDB, and promptly every single one of the 30 or 40 MySQL processes began taking up at least 60 MB RAM! The system slowed to a standstill, and I had to roll back to regular MyISAM tables 'real quick'. (the users on the system take about 10 seconds to start pestering the admin with trouble reports)
I know the real answer is to use shared memory or some such, but anyway, the performance benchmarks always show PostgreSQL ruling when you have mixed INSERTS and DELETES.
One of my clients had a Linux Apache/PHP/MySQL box that was being heavily pounded by 250+ web-based instant messaging users. (I know... using a database to handle instant messages:-( ).
What's the real problem using a DB for instant messaging? What do you propose as an alternative?
Sounds like your tuning was out of whack. You're not very specific about what you tuned (heck, if everyone was specific what fun would Slashdot be?) but I'd bet you had 2 wires crossed somewhere. We're running an system with, what would appear to be, a larger number of inserts and retrievals, for stock quotes and have had completely different results from your experience. We're very happy with the performance and find in a mixed insert/select situation InnoDB is quite a bit faster than MyISAM.
After your bad experience did you send an email to Mr. InnoDB, Heikki Tuuri [innodb.com]? He's quite attentive and I'm sure would like to hear what happened with your application.
Not to mention when you're playing with databases or other vaguely unknown programmatic quantities, best not to test them in production. Perhaps if you had run the system in a test environment the lessened pressure would've enabled you to be better equipped to resolve whatever situation was occurring.
It occurs to me that an anti-MySQL rant is not terribly ontopic. I won't say "Karma Whoring", but that might be the POV of the moderators that gave you three Overrateds and a Troll. Or maybe the Slashdot editors are justed peeved at your implied criticism of their design choices...
Anyway, you do have a good point and you've neatly summarized one requirement of a "real" database. I would just caution not to base all comparisons between two DBMSs on ACID support -- especially if one of them is MySQL. After all, there are a lot of specific things that distinguish a serious DBMS from a toy.
Not that MySQL has a lot of them. The only one that comes to mind is the SQL interpreter. A lot more is missing, and a lot of it is stuff most programmers will miss a lot more than transactional integrity.
The biggest omission, to my mind, is support for relational queries.
"BZZZT!" I hear you saying. "What do you mean MySQL doesn't support relational queries? You can do a join with select/where! And..."
Well, you can specify a join in MySQL. (Just as you can if your "database" is a simple collection of text files [eu.org].) But joins aren't very useful if you can't do nontrivial things with them. And you can't, because MySQL takes a terrible performance hit if your query involves more than one index. That's why Slashdot now numbers posts sequentially [slashdot.org]. Discouraging the first-posters was merely a side effect.
I didn't notice anything about online backups, point-in-time recovery, or standby databases. Is any of this possible on PostgreSQL yet? How about clustering/parallel server. Seems like these are important features to become an Oracle/SQL Server replacement.
I didn't notice anything about online backups, point-in-time recovery, or standby database
Online backups: yes, for quite some time
point-in-time recovery: postgres uses WAL undo/redo logging, but I'm not sure what the state of rollback tools is at the moment.
Standby database: Assuming you mean Master/Slave replication, this is one of the major features planned for 7.3; 7.x has added a lot of the infrastructure needed for replication, and by 7.4 they hope to have multimaster replication (ie a fully distributed database).
Point in time recovery is not yet available AFAIK. This is one critical feature that distinguishes commercial databases from Postgresql.
My company is using Postgresql in a traditional OLTP context, and we would very much like to have the point in time recovery. At some point it could make the difference between us continuing in the open source mode or switching to (ugh) Oracle!
I do understand that PIT recovery is being worked on. I wish I had time to help!
MySQL has master-slave replication.
In fact, you can make a trees of replication
(good for high transaction volumes with massive
redundancy) or even daisy-chain replication into
a ring (giving you master-....-master).
It's too bad that it takes about 2 years to
update the public perception when a product
transitions from toy to tool.
.
I have been a great fan of PostgreSQL since the 6.5.3 days. It has really come a long way, and it is very nice to see recognition on slashdot. I didn't notice any of the previous releases on slashdot, and I wasn't aware the editors paid it tooo much attention.
I think, above everything else, I appreciate the development process of PostgreSQL. I have read the mailing lists for a long time now, and I have learned a lot from the comments of the developers. The "hackers" list sometimes gets well above my head, but most of the issues are very intresting to read, and are applicable in many areas of logic. I commend all the developers for their hard work as well as their great talent.
I've been using mysql for close to three years now, maybe around 6 months back I tried out postgresql, I switched a site or two over to using it. I did some lame inaccurtate benchmarking of a "typic page", pgsql was twice as slow at the time using v7.1, untweaked, but mysql was also untweaked.
Anyway's let me tell you, pgsql's user permissions still make my head swim, it's a nightmare. I mean, ok there's like how many different ways to authenticate a user, plain text password, crypted password, now md5, ident, local ident, kerberos, etc etc. Seriously, what's the "preferred" way to add a normal, non super user, only has select, insert, update, and delete access to a given database that can connect from the local machine, and remotely. Is this even possible? (yes I know you have to give it the -i option for network access)
I guess another kind of oddity about the pgsql is that out of the box, it only does ident type local socket authetication, no tcp/ip. You have to add all these weird rules to a config file to enable these mysterious, "accounts with passwords". Documentation inside the actual config files makes me dead btw.
I've looked forever, but I've yet to find a "mysql to postgresql" quick start guide. I know, rtfm, and figure it out, trust me I have, there's a boat load postgresql docs all split over 10 different manuals it seems.
Also, would it be darn nice to include a start/stop script that reads only config files and can be linked from/etc/rc2.d/ etc. pg_ctl doesn't exactly cut it. I wrote one myself that's basically a wrapper for pg_ctl, but it's a major hack, I can clean it up and post it if anyone's intersted.
Overall a good db, it definitely takes way longer than mysql to set up and understand, but the added features make it worthwhile. Even if you don't use the triggers, out the box transactions, and sub query support right away, you'll be glad when do finally want to use them to help you out with data integrity. Sorry, I'm rambling, I have no real point, this is more of a "this is my experience with this thing post". Maybe it will be of some use to someone, and hopefully I'll get an intelligent post or two setting me straight.:)
Anyway's let me tell you, pgsql's user permissions still make my head swim, it's a nightmare. I mean, ok there's like how many different ways to authenticate a user, plain text password, crypted password, now md5, ident, local ident, kerberos, etc etc.
Options are somtimes considered to be a good thing.
Seriously, what's the "preferred" way to add a normal, non super user, only has select, insert, update, and delete access to a given database that can connect from the local machine, and remotely. Is this even possible?
Add something like this to your pg_hba.conf:
local sameuser trust
host sameuser 127.0.0.1 255.255.255.255 trust
I guess another kind of oddity about the pgsql is that out of the box, it only does ident type local socket authetication, no tcp/ip.
We like the default setup to be reasonably secure.
I've looked forever, but I've yet to find a "mysql to postgresql" quick start guide.
Seriously, what's the "preferred" way to add a normal, non super user, only has select, insert, update, and delete access to a given database that can connect from the local machine, and remotely. Is this even possible?
Add something like this to your pg_hba.conf:
local sameuser trust
host sameuser 127.0.0.1 255.255.255.255 trust
That's not authentication! "trust" just allows logins, period. Try "psql -U postgres" as anyone on that machine. You'll instantly be logged in as the superuser.
Something like this works fairly well on Postgresql 7.1:
host all 127.0.0.1 255.255.255.255 ident sameuser
host all 0.0.0.0 0.0.0.0 password
Then enable TCP/IP connections ("tcpip_socket = true" in postgresql.conf)
Very important: make sure your ident server is trustworthy. Many ident servers have an option to allow a user to fake identification. Turn it off.
Also, the config I posted there will let any user connect to any database. That's the simplest, but not the most secure. The "sameuser" in the database field won't be enough to let the superuser connect to databases. You might add a seperate line for that with an ident map containing only postgres (the file would have only the words "postgres postgres" in it, on one line). And then "all" in the database field with that map. I.e., "host all 127.0.0.1 255.255.255.255 ident postgres"
For remote connections, just make sure they have a password in the database:
create user slamb with password '12345';
alter user bob with password 'newpassword';
There's no authentication method here specified for UNIX domain sockets, so they just don't work. You'll need to set the PGHOSTNAME="localhost" environmental variable for stuff to authenticate correctly. I did this because pgsql 7.1 did not support ident on UNIX domain sockets. pgsql 7.2 now does, on certain platforms. (Just replace "host <db> <ip> <netmask> ident <map>" with "local <db> ident <map>")
pgsql 7.2 adds pam support. If your UNIX and PostgreSQL usernames correspond, it should work.
pgsql 7.2 also adds support of encrypted passwords. There's an option for storing password encrypted in the database and an option for challenge-based encryption. I think these methods are incompatible - good challenge-based encryption requires the password be stored in plaintext on the server.
There has been Kerberos auth for some time. I'm trying to switch over to this now, as I'm setting up Kerberos on my network. It's a more complicated system to set up correctly, though. Get something else working first.
I'm sure I'm not the only slash-dotter who was on the verge of exceeding the 4 billion transaction limit on their pgsql-based Anime fan fiction submission website.
Congratulations to the pgsql developers on the new release. It looks like the only showstopper now for heavy duty use is lack of replication. I know that there are developers working on this. Does anyone know what sort of progress is being made?
There are a few decent books about PostgreSQL out there now. It is
so much nicer than a few years ago.
Practical PostgreSQL. [amazon.com]
I think this one just came out as a bound book. I just got
it a couple days ago and it is pretty good.
It is also online. [commandprompt.com]
Beginning Databases with PostgreSQL. [amazon.com]
This is one of those Wrox books which is about 10000 pages, including 80% of what I want to know and 2000% of what I don't.
There other others, but I think they are weaker.
I was disappointed with the one just called
PostgreSQL. [amazon.com]
PostgreSQL also comes with some great documentation (probably the best I've yet seen with any software I've used (the only contender I can think of offhand might be the Devpac assembler on the AtariST many years ago))
(This is also of course online, e.g. the 7.1 version is up at http://www.ca.postgresql.org/users-lounge/docs/7.1/postgres/ , or interactivly the interactive version http://www.postgresql.org/idocs/)
So how do we encourage other projects to develop such documentation, which I assume is very time consuming and difficult, as well as being for very little reward (How often do we see posts appreciating new software features compared to those acknowleding the associated documentation?)
I've been using a 7.2 beta on a production website for a while now. I know it's not generally considered kosher to run a beta of a new release on a production website, but it isn't exactly a high-traffic site, and it's been running without a hitch since I started it up a month ago. I'd estimate it does about a few thousand transactions a day or so. (Not true begin-commit-rollback-transactions, just updates/inserts and selects.)
A few good features...
- vacuum doesn't need to lock things up all the time, which is definitely nice.
- the speed increase is noticible, especially on a dual P3 SMP.
- there have been a few changes to a number of the datatypes, like the increased resolution on timestamps (now recorded with milliseconds), changes to char and varchar types (they now reject strings longer than their limits -- i.e. a varchar(30) won't accept 31 characters by cutting off the excess).
- you can supply passwords via md5 hashes rather than plaintext or using crypt, which is definitely good, especially for connections made over the wire. You can also supply the passwords via UNIX domain sockets now, which for some reason you couldn't before. (You had to start postmaster with -i and connect via an INET port for that sort of authenication. Ew.)
The databases are not binary compatible, so upgrading from 7.1 to 7.2 will require a pg_dump-from-7.1-initdb-and-dump-into-7.2. Life is rough.
I've been working with it quite a bit and I haven't had any real problems throughout beta testing and the RCs. The "production" site will probably be upgraded later today, or tomorrow.
I have no idea about RPMs, since I always compile from source, but check out freshrpms or rpmfind.net or something.
Yes, I know it should be avoided, but I'll ask anyway since sometimes you have no other choice:(
I'm currently using MySQL on a Win2k server and it actually runs pretty okay and is very easy to install. What about PostgreSQL? Last I looked at it - it was a lot more difficult to get running on Windows. Has that changed or are PostgreSQL still more or less *nix only?
Maybe he wants to learn about pgsql and alternatives to mysql? That would be healthy!
I myself with no experience of cygwin found it a pain in the ass to get postgres working on windows. There was no quick and easy howto on the postgres site at the time, im still unsure if a quick step by step guide exists today.
I'm sure this puts a lot of people off from initially trying out postgres. Mysql makes it a lot easier to check out on windows, simply run the installation program and its up and running.
One feature I find really useful (I do lots of web programming) in MySQL that I couldn't find an equivalent of in Postgres is FULLTEXT indexes... The only thing I could find was a hack in the contrib repository...
Is there any plan to add this or some equivalent? Is it already there and I just missed it? (it's one of the reason I haven't done versions of my open source releases for Postgres)
Alter table improvements. Changing schemas now with triggers etc is a nightmare. How are others doing this?
I can't figure out how to avoid "" around "every" "single" "identifier" which is annoying.
Continued increased speed and performance. We can do database backed sessions in a mysql damn fast, and it has been tricky to get postgresql to perform as well.
Reclaim space without forcing table locks. The fact is I'm always amazed folks can even lock their tables for an hour. How do you run 24x7 systems with that. The new VAC is a great step.
Improve the website. It is ugly and hard to navigate. Just copy php.net/mysql.com or some other reasonable devloper portal.
Ingres code did make it in to Postgres, but I don't know if any of it is in PostgreSQL. I'm sure there was a lot initially, but I seriously doubt there's any there anymore.
Keep in mind that when you abbreviate PostgreSQL to "Postgres", you're really talking about a seperate, older product.
I guess it doesn't really matter, since Postgres is long gone, but it still annoys me every time I see it.
Keep in mind that when you abbreviate PostgreSQL to "Postgres", you're really talking about a seperate, older product.
I guess it doesn't really matter, since Postgres is long gone, but it still annoys me every time I see it
To call PostgreSQL a "seperate" product is not entirely acurate. The reason for the name change (first to Postgres95, and later to PostgreSQL), was because they dumped the PostQuel query language in favor of good ol' SQL. Under the hood, it's still the same DBMS (although somewhat enhanced and several versions more mature).
Go to PostgreSQL.org [postgresql.org], and you will discover that in the opening pages of their tutorial is the following line:
The terms "Postgres" and "PostgreSQL" will be used interchangeably to refer to the software that accompanies this documentation.
So if you have a problem with people saying "postgres" instead of "postgreskew-ell", take it up with the folks writing the documentation.
They're not use MySQL to store all their critical data. They're dumping all their data, presumably from some other more reliable database (Oracle, it sounds like), into mysql for quick web searches.
IOW, they're using mysql for what it does best: As a fast datastore for when data integrity isn't important (because they have all the data backed up in Oracle and could redump it to mysql at any time).
Admittedly, some of this post is conjecture, but you'd be crazy to suggest that the Census Bureau would trust all their critical data to mysql.
Don't take this the wrong way but that is seemingly exactly an ideal project for which MySQL was made for. Data access appears to be primarily read-only with none of it being critical. Worse case, if the database becomes corrupt, you simply restore the whole thing from back up. Since the data doesn't appear (based on description) to be changing rapidly (if at all or perhaps backend updates only), issue such as online back or high availability is likely not to be an issue in the least. If they were, they'd probably be using that Oracle license.
Postgres is rock solid, but notoriously slow. MySQL is wicked fast, but doesn't scale well.
Oh puh-lease. That old "postgres is slow" arguement is ancient. While mysql may be faster for simple insert/selects, that's where it stops, and compared to today's postgres it's not that much faster. Update your FUD, please.
So, you agree that postgres is slower, but it's FUD when I say it's slower. Whatever, dude.
Read your comment again:
Postgres is rock solid, but notoriously slow.
"Nortoriously slow" is where I have the problem. It's not notoriously slow. MySQL is marginally faster than PostgreSQL on simple insert/select statements, but that's it. So yes, you are spreading fud, dude.
1. The acronym "FUD" does not mean "saying bad things about a program". FUD stands for "Fear, Uncertainty and Doubt". Spreading FUD would be saying something like "you better not have your company depend on Java, because Sun might be out of business in a few months." It's only FUD if you are trying to make people AFRAID of using a product. If you think that scaring people away from postgres was the main thrust of my post, you need to read it again. To say postgres is slower than MySQL is not spreading FUD, it's stating a fact. A fact that I am well aware of, because I use PostgreSQL 7.1 every damned day, both at work, and in my own home projects.
2. The word "notorious" does not mean extreme. It means "well noted". Since even you, who is obviously a passionate defender of all things postgres, acknowledge that MySQL is at least slightly faster, my statement was essentially correct.
Learn what words mean before you start shooting your mouth off.
The acronym "FUD" does not mean "saying bad things about a program". FUD stands for "Fear, Uncertainty and Doubt".
Almost correct. Taken purely as an acronym, that is what FUD means. However the definition has grown to include tactics including half-truths and downright wrong information. Your statements could have been taken as FUD as you were saying that PostgreSQL is notoriously slow. That's just like "Linux is is useless compared to Windows on the desktop," or "MySQL doesn't support transactions." These things may have been true in the past but are not now, so they are considered statements of FUD. Back to my particular statment: you're casting doubt over the performance of the system.
The word "notorious" does not mean extreme. It means "well noted".
That is exactly correct. Your well-noted observation that PostgreSQL is slow is dated and unsubstantiated. Even older versions of PostgreSQL fare much better than MySQL when you start doing more than one update to a table at a time or involve anything but trivial inserts and selects. Earlier versions also had synchronous disk access defaulted on which hurt performance signficantly. Both of these facts are well-noted as well, but you did not bring them up, which to me also helps the FUD claim: you're only telling half the story.
I'm not really a passionate defender of all things Postgres; I just hate seeing the same old tired and dated arguments brought up again and again as they simply don't apply anymore.
I know what the words mean; don't try to cast uncertainty and doubt over my English proficiency, please.
You said yourself that postgres is, in fact, slower than MySQL. There's a lot of good reasons why it's slower. An enterprise-level database needs certain features (which MySQL lacks) which will always cause a performance hit.
Even had you not conceded that point (which makes everything I said *fact*, rather than *FUD*), I don't really feel like I need you to tell me how spiffy postgres is, because, like I said, I use postgres all the time, and have compared its speed to enough other systems to know how well it performs.
What astonishes me most of all about all your whining about "FUD" was that, when I posted my original message, I actually thought it would be the MySQL fans that would be more likey to get pissed off at me (for calling MySQL a rake to potgres's tractor). The fact that you took that post, which basically said that both DBMS's have their uses, as a "FUD" attack on postgres leads me to the only possible conclusion, that any conversation with you must be like walking on egg shells, because you are too fucking touchy. Take a few breaths, re-read my original post, and you just might realize that it was posted by somebody with no agenda against your favorite program.
You said yourself that postgres is, in fact, slower than MySQL. There's a lot of good reasons why it's slower. An enterprise-level database needs certain features (which MySQL lacks) which will always cause a performance hit.
You're quite correct. However, and as I'm sure you're aware, PostgreSQL is oftentimes faster than MySQL when you're doing anything other than simple inserts and selects. If you'll read my original post again you'll see that I was upset when you didn't justify the "slower" than MySQL; you just blanket-statement said that it was slower, which is often taken as "always."
What I don't understand is why you think I'm the one walking on eggshells, or if I'm trying to convert you to PostgreSQL. All I was doing was calling you to task on your statement. You say you've got all this database experience so I'm sure that we're just arguing over semantics here. (by the way, I do thank you for keeping the argument civilized, it's rare on/.) MySQL is faster than PostgreSQL for the simple stuff. We both agree on that. PostgresSQL is taken more seriously than MySQL because of the features; I think we both agree with this too. And finally, PostgreSQL is faster than MySQL when you are doing any "real" database work (more than just simple inserts and selects on simple tables); I think we both agree on this as well.
As far as my resume goes: so what? I have enough relevant experience to discuss these matters and it seems that we're just arguing over nothing anyway; does a 30-year vertran have anything more important to say than a 3-year programmer? We're not talking about something where vast differences in experience make much difference, if at all.
ok, so replace encryption with one way hashing. It's effectively the same thing here, it enhances the security of your passwords, a very very welcome step, and not something to be treated lightly.
One of the biggest problems I have with database servers are all the passwords that end up floating around semi-protected, this is a GREAT new feature!
Easy way to approach this is to install the 7.2 installation in a seperate path from the 7.1 installation.
If done right (ie --prefix=/usr/local/pgsql72) you can easily compile, install and have everything ready to go.
Initdb to the/usr/local/pgsql72/data location, then fire up 7.2 on a seperate port (5433 rather than 5432 -- may wish to move the socket too). Setup & test our security aspects.
Turn down the webserver to the 7.1 system. Close any applications.
Of course, this takes some time and requires double the disk space. While this is running upgrade any changes to your software that you wish to apply (datatype changes, features being outdated, etc.).
As soon as it's done, shot down both the 71 and 72 databases. Reset the port and socket to the default location for 72 and re-fire them back up. Alternately you can have your application look in a different spot for a connection.
Fire back up the webserver, or re-enable connections to the system. A 4GB database using the above method with a relatively simple structure (100 tables, 50 or so functions, etc) took about 10 minutes of actual downtime.
Of course, several test runs should go before hand. I heavily suggest you enable md5 style passwords at the same time and update the users password storage mechanism -- ODBC doesn't support these yet that I know of.
"Religion is something left over from the infancy of our intelligence, it will
fade away as we adopt reason and science as our guidelines."
-- Bertrand Russell
Congrats to the PostgreSQL Development team! (Score:3, Insightful)
The only major hurdle left is replication built into the server.
Re:Congrats to the PostgreSQL Development team! (Score:5, Interesting)
I should probably clarify that - the full locking vacuum was separated out into two parts one which analyzes statistics and doesn't lock the tables and another which does what the old vacuum did by reordering data blocks to shrink the size on disk.... the bonus is that before you might run vacuum once a week or so because of the impact it has to a production system by doing a full lock on each table it vacuumed, now you can run it much more frequently as all it consumes is cpu time. Shrinking size on disk is nice, but it's the statistics that help the query planner turn SQL into faster queries.
Re:Congrats to the PostgreSQL Development team! (Score:4, Insightful)
Re:Congrats to the PostgreSQL Development team! (Score:3, Informative)
This is one area in which Oracle shows its power over the open source databases. (It's also a big oportunity because Oracle can be improved on). Oracle can actually tap into continuous statistics gathering on a per SQL level by using oracle's v$sqlarea dictionary view. If you need high-powered scrutiny on a particular activity, you can trace the session to logs and see the row statistics at every step of the exectution plan. Oracle has more optimizer hints, and has a facility to "pin" an execution plan, so that it won't be reevaluated if optimizer behavior changes. Oracle is working toward server-side SQL tuning, where you can ID bad SQL's and "intercept" them at runtime by adding hints on the server side. That will be an absolutely huge feature, since often SQL hits your system that you can't directly control but can predict.
Re:Congrats to the PostgreSQL Development team! (Score:2)
As for the optimizer issues associated with updated statistics, I guess I would like to expand on this slightly. Most people don't realize that many optimizers have hard and arbitrary thresholds for determining the best possible algorithmic data path. This is why it is very important to understand not only your data distribution but the nature of the delta of your data distribution. Also key is is the understanding of how multiple indexes will effect the optimizer's (especially if multiple index types are supported by the RDBMS in question) output. Common causes for negative query performance is slight distribution statistical changes which hit an arbitrary threshold causing another data path to be recommended by the optimizer. Thusly the significance of Oracle being able to pin query plans can be profound.
Because of these issues, my self and several other DBA's do not advocate blindly updating statistics simply for the sake of doing so. First of all, if your data distribution tends not to vary, this can result in nothing more than a waste of time. Furthermore, if your data distribution tends to vary only slightly, you'd better understand these trends as it's very possible timely and updated statistics may work against you depending on where you were within your trend's cycle.
While I knew about the pinned query plans on Oracle, I didn't know about the planned server tuning hints. Abilities (assuming I understood you correctly) like this can go a long way toward addressing the issues I've raised about query optimizers.
In short, know your database AND your data!
Re:Congrats to the PostgreSQL Development team! (Score:4, Interesting)
This is incorrect (I believe it describes the situation after 7.1: the VACUUM ANALYZE command only needed to lock the table exclusively when VACUUMing, only a read lock was needed for ANALYZE).
In 7.2, the ANALYZE command can now be used separately, as you say. However, there are other (more important) improvements: ANALYZE only takes a look at a statistical sampling of the rows in the table. This means that collecting statistics on even enormous tables is very fast. Furthermore, VACUUM has been made "lazy" by default: this means that it doesn't attempt to reclaim space as aggressively as before, but it no longer requires an exclusive lock on the database (instead, it cooperates with other DB clients). The old behavior is available as "VACUUM FULL", and it is suggested whenever you need to reclaim a lot of diskspace (e.g. you delete hundreds of thousands of rows of data and need the space).
It was (and is) suggested that you run VACUUM once per day.
As far as I know, you only really need to update your planner stats when you change the statistical distribution of your data. Of course, running ANALYZE's reasonably often won't do any harm, and is a relatively cheap operation (performance-wise).
Re:Congrats to the PostgreSQL Development team! (Score:4, Informative)
The 7.1 vacuum analyze required table locks. Doesn't matter which phase of it required locks - it required exclusive locks because it vacuumed. By breaking that into a separate commands the need for downtime is reduced drastically (down to the example which you point out - deleting thousands of rows at a time).
I know that you're recommended to run vacuum once per day, but I found that on a large database running on a fast server a daily vacuum took nearly 30 minutes to complete... that's 30 minutes of sequentially locked tables. Can't afford to do that every day - moving it to once a week may have degraded performance but it reduced the downtime window from 30 minutes per day to 1 hour per week.
I'm just happy that I don't have to bring a production server to its knees once a week (or for that matter once a day) just to do some table maintenance.
Re:Congrats to the PostgreSQL Development team! (Score:4, Informative)
No I didn't, read my post again.
PostgreSQL has lots of different types of locks of varying granularities. Saying "table locks" doesn't mean a whole lot.
It does though -- in 7.1, splitting vacuum and analyze internally reduced the time that an exclusive lock needs to be held.
This is where you're wrong. The reduction in downtime has nothing to do with allowing ANALYZE to be executed separately. It is entirely the result of the new vacuum code (which is "lazy", unlike a VACUUM FULL -- which does a 7.1-style VACUUM). In 7.2, running VACUUM (with or without ANALYZE) is fast, and doesn't require an exclusive lock -- so your database can continue serving clients while a VACUUM is executing. Whether you choose to run ANALYZE at the same time or separately is really irrelevant.
On that, we agree
Re:Congrats to the PostgreSQL Development team! (Score:2)
I'll say that again, you reiterated everything I was thinking that I implied in my post... but true not everything I said.
This is where you're wrong. The reduction in downtime has nothing to do with allowing ANALYZE to be executed separately. It is entirely the result of the new vacuum code (which is "lazy", unlike a VACUUM FULL -- which does a 7.1-style VACUUM). In 7.2, running VACUUM (with or without ANALYZE) is fast, and doesn't require an exclusive lock -- so your database can continue serving clients while a VACUUM is executing. Whether you choose to run ANALYZE at the same time or separately is really irrelevant.
Yes, I misspoke there and left out lots of detail - when I run a vacuum I always run it with anaylyze since there's no way to analyze in 7.1 without vacuuming (at least that I'm aware of). Why would I want to run vacuum followed by vacuum analyze only to have the tables exclusively locked twice? I want to do both, clean up unused rows, and update stats.. so I always run vacuum analyze. What's worse is that the 7.1 vacuum can deadlock with other processes because it uses exclusive locks.
And yes, I know that in 7.1.x they changed the locking behavior so the vacuum analyze so it did not hold an exclusive lock during the portion where it was anlyzing the table. But that doesn't change the fact that it did hold an exclusive lock while vacuuming - which it had to do because the commands were joined together.
Effectively there were TWO good things that happened. Analyze was broken into a separate command - that alone would have been an improvement since it would have allowed for vacuums to be done distinctly from analyzes. Vacuum becoming lazy is a second good thing - and yes a very good thing it is. The short of all this is that now you can run vacuums and analyzes separately all day long if you want.
I'm not trying to get tha last word ;-) really! I think we both can agree that 7.2 is a big leap forward.
Substantial improvements and it's only .1 higher (Score:1)
Re:Substantial improvements and it's only .1 highe (Score:2)
It would be pretty funny if they started mimicking Microsoft instead: PostGreSQLXP, PostGreSQL98, PostgreSQL2000AdvancedServer, etc. But I digress.
Re:Substantial improvements and it's only .1 highe (Score:2, Funny)
Awsome.. (Score:2)
On a serious note, I greatly welcome the 4 billion transaction limit being raised. Sure, it's 126 per second over the period of a year -- but it's one step closer to being the database for a major system.
Next step, master-master replication and clustering. Only wish I knew enough about either topic to help out. I'm doing lots of research on it though, maybe someday.
Re:Awsome.. (Score:2)
"They Live". What a film.
"We've got one who can see!"
(Kind of on topic actually, people are starting to get through the hype associated with some things and adopt stuff that works, like PostgreSQL).
Re: plication (Score:3, Informative)
In fact there are many solutions available (check techdocs.postgresql.org [postgresql.org] for a list...)
The most advanced guys on the subject seem to be the swiss engineering school in Zürich. Here [inf.ethz.ch] is a list of their publications.
They seem to have developped a replication scheme (Postgres-R) where they have better than linear performance improvement when they add new masters...Quite impressive
Quentin
Re:Awsome.. (Score:2)
On a serious note, I greatly welcome the 4 billion transaction limit being raised. Sure, it's 126 per second over the period of a year -- but it's one step closer to being the database for a major system.
Jeez. I had no idea. If our MySQL server had this limit:
I'd be receiving a phone call in about a month from my client wondering why the site stopped working. ;)
And just to make this look less like flamebait: I use and appreciate both MySQL and PostgreSQL. Do get a life, kids.
Re:Awsome.. (Score:2)
As others have pointed out, there's a difference between "Queries" and "Transactions". Now, people have often facetiously accused MySQL people from not knowing what a Transaction is, but here we have some solid proof. :-)
Seriously, the previous PostgreSQL limitation probably did have some serious ramifications for bigger installations, so it's a good thing it's history now. When I bring PostgreSQL up in polite public these days, Oracle jockeys now feel the need to critique it more seriously than the "don't make me laugh" attitude of even a couple of years ago. Not because PostgreSQL is that likely to sink a real high-end RDBMS, but it will be an increasingly plausible contender for users who don't really need or use all of the cool (but expensive) features of Oracle or DB2.
Re:Awsome.. (Score:2)
Selects aren't counted of course.
Yes, disk reclaiming locked whole tables. Mind you, until recently MySQL locked tables for every statement (including selects??), so did SyBase (but not for selects I don't think).
Locking a table for a minute once a day isn't nearly as bad as 150 times a second -- but of course Postgres does neither now.
highlights... (Score:5, Informative)
Highlights of this release are as follows:
Re:highlights... (Score:5, Informative)
Please mode this up! (Score:1)
Thanks for the info. I didn't know that.
What a karma whore...
Thanks for the notice! (Score:3, Interesting)
When we did OpenBSD we had to be VERY careful not to do more queries than necessary (including some complicated joins and then having PHP parse the results). With Linux as the database server, I feel that I can throw hardware at it (including moving to Solaris if need be) and optimize the queries a bit less to abstract the programming.
SMP improvement is important, as the next step up for us is a Quad-Xeon processor, then Sun Hardware. (PostgreSQL seemed to run best on Linux and Solaris from the old website)...
It's such a shame that they never figured out the PostgreSQL support model. I would have happily paid for some support, but it always seemed easier to get the OpenBSD port or the Redhat RPM than pay for their CDs. They never included much beyond installation support. I knew how to install it, having some support (not the mailling list) for some of my optimization questions would have saved days and been worth a support contract.
Alex
I noticed Redhat's (Score:2)
I know about redhatdb. For $2500, I wasn't really sure what I got. It may become to solution in the future however.
Our Red Hat 6.2 boxes would get rooted if we did the install before lunch and came back to fix them after lunch.
Now they sit inside a nice safe OpenBSD network. Red Hat Database looks promising, it's just sad that Redhat stole their business because they couldn't get their shit together.
Alex
Such a fucking shame... (Score:2)
Look, I'm a small guy, but I was having pain in the ass optimization problems getting OpenBSD and PostgreSQL to run at a reasonable speed. However, I could get it running. I am 100% certain that if I called support with the $500 CD, I'd be told that I was SOL.
I can't believe there wasn't a migration program. You're right, that was the biggest mistake. A better solution would be if you could cater to dot-coms that did a 2-3 year Oracle license that was coming to the end and needed cash. They could choose to pay for more Oracle or choose to pay your conversion fee. That would get you somewhere.
I can sympathize with management not knowing how to make things fly. However, if you won't charge for software updates (can't on BSD code without making it non-BSD) then worrying primarily about the next software release is silly.
Such a shame. PostgreSQL is a great product, we power all our clients on it. I would have been more comfortable with some support arrangement, but nobody seemed terribly interested in selling one to me. We've gone so long without one, I would only need one as an insurance policy of sorts, not really for the kinds of help I would like.
Tis a shame, better (and bound) documentation and other niceties would have been nice.
Well, look on the bright side, a good job at a company with hot tech is a good resume line as long as the tech stays hot. I leveraged a summer job at Citrix into the training to get my MCSE (this was 4 years ago) and numerous jobs as an NT Admin whereever a Citrix server was involved.
Re:highlights... (Score:1)
Re:highlights... (Score:2)
I have only one feature request for PostgreSQL... (Score:3, Interesting)
eecpg? (Score:2)
I've used ecpg (ESQL/C, similar to Oracle's Pro*C) since I started using postgresql years ago, and have had no problems at all accessing the data from C or C++ code.
If you're using java then this is even more of a no-brainer - just used the JDBC binding.
Re:eecpg? (Score:2)
There are a wide variety of embedded database toolkits that implement RDBMS functionality without the requirement of running a server product.
The eldest example would likely be the classic DBM file, with is descendents NDBM and GDBM, though these are fairly simplictic.
Your next step up would likely be BDB, which, like DBM, uses its own API, but provides a much more richer set of functionality. The concurrent access, transaction, and recovery options are exceptional.
And finally, if you're looking specifically for a SQL interface, the SQLite package provides a fairly complete SQL92 implementation in its toolkit, including automic commit and rollback.
It also has a command-line "monitor" much like the pgsql or mysql commands for administration.
I'm not necessarily dismissing the concept of in-process access to PostgreSQL tables, but given the choices available already available for embedded databases, and the unique requirements in writing one (e.g. locking without a controlling process), I don't quite what the motivation would be.
Re:eecpg? (Score:2)
Re:I have only one feature request for PostgreSQL. (Score:3, Insightful)
Having to dump the database to disk and re-import is a bad thing IMO. Having to add a switch to keep integrety constraints is a very odd thing for a database (shouldn't the default be to *keep* integrety constraints?)
A separate program to preserve LOBs I can rationalize (it's a lot of generally unneeded overhead since few people use LOBs).
It would really be nice if someone would write some wrapper programs to check for foreign keys and LOBs, then wrap the pg_dumpall & pg_dump commands with the appropriate options into one set of programs.
Re:I have only one feature request for PostgreSQL. (Score:3, Informative)
Postgres always keeps its integrity constraints, including when you dump and restore. It's done this, as far as I am aware, since at least 7.0.
LOBs are no longer a problem, since 7.1 supported unlimited row length with binary or ascii data - just use 'bytea' or 'text' fields...
Chris
Re:I have only one feature request for PostgreSQL. (Score:2)
If you are certain that there are no constraint violations, because you've just exported from one database, then disabling them would make importing into a new database much faster. It's common practice, you just re-enable them before you make the new database available for transactions.
Sorta Like "MkExt2IntoReiserFS" (Score:2)
The unavailability of a "Ext2-to-ReiserFS" translator, or Ext2-to-JFS or Ext2-to-XFS occurs for much the same reason, albeit with the further challenge that the "upgrade" would somewhat more resemble a "MySQL-format-to-PostgreSQL-format" conversion :-).
Re:I have only one feature request for PostgreSQL. (Score:2)
Re:I have only one feature request for PostgreSQL. (Score:2)
Ask and you recieve.
Cygwin has a port of Postgres that runs fine. Performance isn't as good as the Linux version, but not bad.
Give it a shot, it comes with the distro, you don't even have to compile it.
Re:I have only one feature request for PostgreSQL. (Score:2)
Eek, overload. Client = application that connects to database. Client = person that is paying you to do stuff. Hmm..
If you're talking about Windows applications that talk to the database, PostgreSQL already has what you need. Just run PostgreSQL on the Unix box and get the ODBC driver (yes, there is one) for the Windows client machines, and you're all set. It works.
If you're talking about Windows people, well, think of this as an infiltration opportunity. Just sell 'em a dedicated database server machine, and they don't need to know what OS it's running. :-)
Obligatory MySQL flames (Score:1, Redundant)
Genome Project (Score:1)
I like it when people use linux on non-x86 architectures.
Those Bastards! (Score:1)
Now I've gotta do it again!
Re:Those Bastards! (Score:2)
Re:Those Bastards! (Score:2)
Remember that being on the bleeding edge is a risk, especially if it might put your data at risk.
Why use PostgreSQL instead of MySQL?: ACID (Score:5, Informative)
ACID (an acronymn for Atomicity Consistency Isolation Durability) is a 'keyword' that business professionals generally look for when evaluating databases. Frankly, non-ACID databases aren't taken very seriously, even if they are used by the likes of Yahoo and Slashdot (like MySQL is).
Here is a quick description of what it means to be ACID [webtechniques.com] compliant:
1. Atomicity is an all-or-none proposition. Suppose you define a transaction that contains an UPDATE, an INSERT, and a DELETE statement. With atomicity, these statements are treated as a single unit, and thanks to consistency (the C in ACID) there are only two possible outcomes: either they all change the database or none of them do. This is important in situations like bank transactions where transferring money between accounts could result in disaster if the server were to go down after a DELETE statement but before the corresponding INSERT statement.
2. Consistency guarantees that a transaction never leaves your database in a half-finished state. If one part of the transaction fails, all of the pending changes are rolled back, leaving the database as it was before you initiated the transaction. For instance, when you delete a customer record, you should also delete all of that customer's records from associated tables (such as invoices and line items). A properly configured database wouldn't let you delete the customer record, if that meant leaving its invoices, and other associated records stranded.
3. Isolation keeps transactions separated from each other until they're finished. Transaction isolation is generally configurable in a variety of modes. For example, in one mode, a transaction blocks until the other transaction finishes. In a different mode, a transaction sees obsolete data (from the state the database was in before the previous transaction started). Suppose a user deletes a customer, and before the customer's invoices are deleted, a second user updates one of those invoices. In a blocking transaction scenario, the second user would have to wait for the first user's deletions to complete before issuing the update. The second user would then find out that the customer had been deleted, which is much better than losing changes without knowing about it.
4. Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination. Hence, even if the database server is unplugged in the middle of a transaction, it will return to a consistent state when it's restarted. The database handles this by storing uncommitted transactions in a transaction log. By virtue of consistency (explained above), a partially completed transaction won't be written to the database in the event of an abnormal termination. However, when the database is restarted after such a termination, it examines the transaction log for completed transactions that had not been committed, and applies them.
It is difficult to trust mission critical data to a database that does not guarantee that it will complete not screw up (short of a bug, of course), this such compliance--even when it is more political than technical--is very important.
Re:Why use PostgreSQL instead of MySQL?: ACID (Score:2, Insightful)
People who try to argue otherwise (including the developers of MySQL), need to go back to school to relearn their math and CS fundamentals. There's a bit of discussion of this at http://searchdatabase.techtarget.com/tip/1,289483
Re:Why use PostgreSQL instead of MySQL?: ACID (Score:5, Funny)
You may have noticed that if several people try to whack a troll at the same time, they all expend one moderator point, even if only a fraction of those points were required to push that troll down into the dreaded depths of -1.
If an ACID complient database were used, and the two steps of whacking the troll, and deducting the moderator points were placed in the same transaction (with a check constraint on the score of the posts to prevent them from dropping below -1) then the later moderators who tried to whack the troll would not have their points deducted, as the transaction would rollback when the constraint on the score of the post was exceeded.
Alas, mysql is not ACID complient, and so this sensless waste of moderator points continues to this day...
Re:Why use PostgreSQL instead of MySQL?: ACID (Score:2)
You can use transactions or locking with MySQL to keep consistancy on this particular issue as well; however, under either database, the use of a transaction to record a moderation is fairly frivilous and probably more of a waste of CPU time than moderation point -- which is why slash doesn't implement it.
For clarification, you can't use transactions on MyISAM tables [yet] but it's not like transactions in MySQL don't exist at all. Just use BDB, InnoDB, or Gemini table types and you get transaction support.
Anyway, I don't care which DB you prefer or use or promote or whatever. I have run into programming problems with both Postgres and MySQL alike.
So, even though you are quite incorrect (which your probably already knew), at least you got me to respond to your troll.
Let the whacking commence!
~GoRK
Re:Why use PostgreSQL instead of MySQL?: ACID (Score:5, Informative)
Actually, this is not the fault of the database. If you mod a post that is already +5 up or mod a -1 post down you will lose a mod point and the score will go unchanged (the moderation total values will increase though)
I Reply:
I have a hard time believing that this behavior started out as a feature. I find it much more likely that it was initially a bug. This bug, being found useful was then elevated to the status of a "feature".
You are correct that it is not the fault of the database, but transaction and constraint support at the database level would have made it easy to prevent this problem from ever cropping up in the first place.
You Said:
the use of a transaction to record a moderation is fairly frivilous and probably more of a waste of CPU time than moderation point
I Reply:
For a system which recieves as much activity as Slashdot, and with a constant stream of friendly trolls looking for any crack in the system that they can use to share the sight of their favorite gaping asshole with the unwilling members of the rest of the population, if I were coding it, I would insist on inserting checks of basic constraints at different levels of the system. The database layer is your last line of defense against abuse of the system.
Secondly, these double checks are useful for finding errors in other levels of the system. Remember the problems that used to crop up from time to time with comments being moderated to -2 or to 6? If the value of the moderations was constrained in the database, not only would users not see this problem, but an error log generated (for the admins only) when a transaction is rolled back in a situaition where it is not expected would have helped isolate the fault very quickly.
The database level checks would also help against rogue activity of people in positions of (limited) trust. Worried about an editor editing one of their accounts to give themselves a huge number of modpoints? Cap the level in the database at 5; it would make it impossible for this nefarious subterfuge to take place.
As for the speed issue; if you are willing to sacrefice verification of correct operation for a small increase in speed, you have severly underspecified your hardware requirements.
Finally, I would like to include a small SQL fragment, showing some of the checks that I would feel are absolutely necessary for a web based discussion system that people are trying to subvert:
--First we create a table for a couple of users
CREATE TABLE "users" (
"uid" integer serial,
"mod_points" integer default 0,
"name" text not null,
CONSTRAINT "user_mod_const" CHECK (((mod_points > -1) AND (mod_points < 6)))
);
--now a table for some posts
CREATE TABLE "posts" (
"date" timestamp with time zone DEFAULT 'now()',
"pid" serial,
"parent" int4
"uid" int4,
"mod" integer DEFAULT 1,
"body" text not null,
"section" integer,
CONSTRAINT "mod_const" CHECK (((mod > -2) AND (mod < 6))),
CONSTRAINT "user_key" FOREIGN KEY (uid) REFERENCES users(uid)
on delete cascade
on update cascade
);
-- the constraint to ensure parent is equal to zero, or another pid in the posts table is left to the reader.
--And now for a function to access them. (Remember - direct SQL is icky; run things through functions to ensure a consistant interface)
CREATE FUNCTION "mod_down" (integer,integer) RETURNS integer AS '
begin; update users set mod_points=mod_points-1 where uid=($1);
update posts set mod=mod-1 where pid=($2);
commit;
select mod from posts where pid=($2);
' LANGUAGE 'sql';
As you can see, this nicely serves as a check to ensure the restrictions I mentioned above. With it being so trivial to add the checks, I can't see any reason to not take this extra step to eliminate nasty surprises.
Re:Why use PostgreSQL instead of MySQL?: ACID (Score:5, Insightful)
I like SQL - I have no problems understanding SQL. I meant that statement in the sense that it feels more clean (to me) to call a stored procedures in the database from the outside, rather than sending the SQL statements to the database over and over.
It also serves as a handly layer of abstraction from your code, in the event that you want to make drastic changes to the structure of the underlying database.
Maybe I've just seen too much code with SQL commands scattered all over the place, doing the same thing in different fashions at different times, and using strange DB specific constructs at random.
I would prefer to keep the DB specific SQL extensions (if they must be used at all) in stored procedures in the database, and present a consistant interface of stored procedures to the external program. That way, you can support different databases, using the special features they each offer, by providing different database initialization files, and not require any changes to the main program.
Re:Why use PostgreSQL instead of MySQL?: ACID (Score:2)
I'd like to rant for a moment: The MS Jet SQl engine sucks, and is the bane of me existence. Thank you.
Re:Why use PostgreSQL instead of MySQL?: ACID (Score:2)
PostgreSQL has a much better [postgresql.org] perl based stored procedure language than mySQL [tangent.org] has. Unlike the mySQL hack, the Pg implementation is a complete procedure language -- you can even use the normal DBI syntax to run queries against the database.
moderator point example not so good (Score:2)
update posts set mod=mod-1 where pid=($2) and mod >= -1;
Then look at the row count and only deduct a moderator point if there's at least one row updated.
I agree that begin/commit is important; it's just that you can do without it in this instance so long as an individual update is atomic.
while that's true (Score:2)
Re:while that's true (Score:2, Interesting)
However, Yahoo is not a typical business. As the Yahoo article [mysql.com] says, "Finance managed its database needs through homegrown flat files and Berkeley DB databases. But those solutions proved to be inflexible and not scalable enough for their needs..."
Homebrew flat files will be wooped by any decent DB server, especially a blazing fast one like MySQL, any day of the week. Yahoo also tends to embrace open source and new technologies more readily. Most of their servers, for example, run FreeBSD and have since the beginning.
This is a slightly bad example considering that FreeBSD is unquestionably a product that you can trust your mission critical data to, but it illustrates to a degree the type of company that Yahoo is.
I, myself, would trust MySQL with anything that didn't matter to someone else. In the end--it's politics, and "nobody ever got fired for using an ACID compliant database."
Re:Why use PostgreSQL instead of MySQL?: ACID (Score:2)
Re:Why use PostgreSQL instead of MySQL?: ACID (Score:2)
Re:Why use PostgreSQL instead of MySQL?: ACID (Score:3, Informative)
One of the nice features of MySQL is the capability of having pluggable persistence managers. An example of that is the default, MyISAM, which you are correct in saying does not support ACID. But with the release of MySQL-Max [mysql.com], which happened awhileago(tm), and MySQL v4 out of the box, support for 3 additional backends was added, BerkelyDB [sleepycat.com], Gemini, and InnoDB [innodb.com], all of which have complete ACID support. InnoDB also supports row level locking and even an initial implementation of foreign keys.
InnoDb is is in use here at Slashdot as well as a good deal of other sites [innodb.com] demanding high-transaction throughput with full ACID support.
With the addition of foreign keys and stored procedures functionality, all of which are on the slate for the 4.x series, the reasons not to use MySQL are lessening every day.
Side note: Yeah, I know Gemini is the red-haired stepchild of the MySQL world. It's still a decent table manager.
Re:Why use PostgreSQL instead of MySQL?: ACID (Score:2, Informative)
One of my clients had a Linux Apache/PHP/MySQL box that was being heavily pounded by 250+ web-based instant messaging users. (I know... using a database to handle instant messages
1. The message tables are constantly subject to INSERT, SELECT, and DELETE queries.
2. MyISAM tables require a full lock for every INSERT or DELETE
3. InnoDB tables have row-level locking
4. Thus, I will install MySQL-Max, and use InnoDB tables for the messaging data, thus relieving PHP of the continuous connect...wait...reconnect scenario
Well, I read every note about configuring MySQL for performance, and I made the change to InnoDB, and promptly every single one of the 30 or 40 MySQL processes began taking up at least 60 MB RAM! The system slowed to a standstill, and I had to roll back to regular MyISAM tables 'real quick'. (the users on the system take about 10 seconds to start pestering the admin with trouble reports)
I know the real answer is to use shared memory or some such, but anyway, the performance benchmarks always show PostgreSQL ruling when you have mixed INSERTS and DELETES.
Re:Why use PostgreSQL instead of MySQL?: ACID (Score:2)
What's the real problem using a DB for instant messaging? What do you propose as an alternative?
Re:Why use PostgreSQL instead of MySQL?: ACID (Score:2, Informative)
After your bad experience did you send an email to Mr. InnoDB, Heikki Tuuri [innodb.com]? He's quite attentive and I'm sure would like to hear what happened with your application.
Not to mention when you're playing with databases or other vaguely unknown programmatic quantities, best not to test them in production. Perhaps if you had run the system in a test environment the lessened pressure would've enabled you to be better equipped to resolve whatever situation was occurring.
The ACID factor and Relations (Score:2)
Anyway, you do have a good point and you've neatly summarized one requirement of a "real" database. I would just caution not to base all comparisons between two DBMSs on ACID support -- especially if one of them is MySQL. After all, there are a lot of specific things that distinguish a serious DBMS from a toy.
Not that MySQL has a lot of them. The only one that comes to mind is the SQL interpreter. A lot more is missing, and a lot of it is stuff most programmers will miss a lot more than transactional integrity.
The biggest omission, to my mind, is support for relational queries.
"BZZZT!" I hear you saying. "What do you mean MySQL doesn't support relational queries? You can do a join with select/where! And..."
Well, you can specify a join in MySQL. (Just as you can if your "database" is a simple collection of text files [eu.org].) But joins aren't very useful if you can't do nontrivial things with them. And you can't, because MySQL takes a terrible performance hit if your query involves more than one index. That's why Slashdot now numbers posts sequentially [slashdot.org]. Discouraging the first-posters was merely a side effect.
Online Backups/High Availability (Score:3, Insightful)
Re:Online Backups/High Availability (Score:5, Informative)
Online backups: yes, for quite some time
point-in-time recovery: postgres uses WAL undo/redo logging, but I'm not sure what the state of rollback tools is at the moment.
Standby database: Assuming you mean Master/Slave replication, this is one of the major features planned for 7.3; 7.x has added a lot of the infrastructure needed for replication, and by 7.4 they hope to have multimaster replication (ie a fully distributed database).
SONY. Because caucasians are just too damn tall.
Crazy People. Hysterical movie.
Sumner
Re:Online Backups/High Availability (Score:2)
My company is using Postgresql in a traditional OLTP context, and we would very much like to have the point in time recovery. At some point it could make the difference between us continuing in the open source mode or switching to (ugh) Oracle!
I do understand that PIT recovery is being worked on. I wish I had time to help!
Re:Online Backups/High Availability (Score:2, Interesting)
In fact, you can make a trees of replication
(good for high transaction volumes with massive
redundancy) or even daisy-chain replication into
a ring (giving you master-....-master).
It's too bad that it takes about 2 years to
update the public perception when a product
transitions from toy to tool.
.
Re:Online Backups/High Availability (Score:2)
I someone wanted to become active in contributing to this project, who would we need to contact? Is there a primary contact for each sub-project?
PostgreSQL development process (Score:3, Interesting)
I think, above everything else, I appreciate the development process of PostgreSQL. I have read the mailing lists for a long time now, and I have learned a lot from the comments of the developers. The "hackers" list sometimes gets well above my head, but most of the issues are very intresting to read, and are applicable in many areas of logic. I commend all the developers for their hard work as well as their great talent.
Jeff Davis
Long time mysql user, postgresql newbie (Score:2, Interesting)
Anyway's let me tell you, pgsql's user permissions still make my head swim, it's a nightmare. I mean, ok there's like how many different ways to authenticate a user, plain text password, crypted password, now md5, ident, local ident, kerberos, etc etc. Seriously, what's the "preferred" way to add a normal, non super user, only has select, insert, update, and delete access to a given database that can connect from the local machine, and remotely. Is this even possible? (yes I know you have to give it the -i option for network access)
I guess another kind of oddity about the pgsql is that out of the box, it only does ident type local socket authetication, no tcp/ip. You have to add all these weird rules to a config file to enable these mysterious, "accounts with passwords". Documentation inside the actual config files makes me dead btw.
I've looked forever, but I've yet to find a "mysql to postgresql" quick start guide. I know, rtfm, and figure it out, trust me I have, there's a boat load postgresql docs all split over 10 different manuals it seems.
Also, would it be darn nice to include a start/stop script that reads only config files and can be linked from
Overall a good db, it definitely takes way longer than mysql to set up and understand, but the added features make it worthwhile. Even if you don't use the triggers, out the box transactions, and sub query support right away, you'll be glad when do finally want to use them to help you out with data integrity. Sorry, I'm rambling, I have no real point, this is more of a "this is my experience with this thing post". Maybe it will be of some use to someone, and hopefully I'll get an intelligent post or two setting me straight.
Re:Long time mysql user, postgresql newbie (Score:5, Informative)
Options are somtimes considered to be a good thing.
Seriously, what's the "preferred" way to add a normal, non super user, only has select, insert, update, and delete access to a given database that can connect from the local machine, and remotely. Is this even possible?
Add something like this to your pg_hba.conf:
I guess another kind of oddity about the pgsql is that out of the box, it only does ident type local socket authetication, no tcp/ip.
We like the default setup to be reasonably secure.
I've looked forever, but I've yet to find a "mysql to postgresql" quick start guide.
try here [postgresql.org]
Also, would it be darn nice to include a start/stop script that reads only config files and can be linked from
It's in contrib/start-scripts. Or you might as well download the RPMs.
Re:Long time mysql user, postgresql newbie (Score:3, Informative)
That's not authentication! "trust" just allows logins, period. Try "psql -U postgres" as anyone on that machine. You'll instantly be logged in as the superuser.
Something like this works fairly well on Postgresql 7.1:
host all 127.0.0.1 255.255.255.255 ident sameuserhost all 0.0.0.0 0.0.0.0 password
Then enable TCP/IP connections ("tcpip_socket = true" in postgresql.conf)
Very important: make sure your ident server is trustworthy. Many ident servers have an option to allow a user to fake identification. Turn it off.
Also, the config I posted there will let any user connect to any database. That's the simplest, but not the most secure. The "sameuser" in the database field won't be enough to let the superuser connect to databases. You might add a seperate line for that with an ident map containing only postgres (the file would have only the words "postgres postgres" in it, on one line). And then "all" in the database field with that map. I.e., "host all 127.0.0.1 255.255.255.255 ident postgres"
For remote connections, just make sure they have a password in the database:
create user slamb with password '12345';
alter user bob with password 'newpassword';
There's no authentication method here specified for UNIX domain sockets, so they just don't work. You'll need to set the PGHOSTNAME="localhost" environmental variable for stuff to authenticate correctly. I did this because pgsql 7.1 did not support ident on UNIX domain sockets. pgsql 7.2 now does, on certain platforms. (Just replace "host <db> <ip> <netmask> ident <map>" with "local <db> ident <map>")
pgsql 7.2 adds pam support. If your UNIX and PostgreSQL usernames correspond, it should work.
pgsql 7.2 also adds support of encrypted passwords. There's an option for storing password encrypted in the database and an option for challenge-based encryption. I think these methods are incompatible - good challenge-based encryption requires the password be stored in plaintext on the server.
There has been Kerberos auth for some time. I'm trying to switch over to this now, as I'm setting up Kerberos on my network. It's a more complicated system to set up correctly, though. Get something else working first.
Official docs are here [postgresql.org]
Whew, that was close (Score:4, Funny)
Replication? (Score:2)
PostgreSQL Books (Score:3, Informative)
There are a few decent books about PostgreSQL out there now. It is so much nicer than a few years ago.
Practical PostgreSQL. [amazon.com] I think this one just came out as a bound book. I just got it a couple days ago and it is pretty good. It is also online. [commandprompt.com]
Postgresql : Developer's Handbook. [amazon.com] I (as a developer) like this one best of all that are out now.
PostgreSQL Essential Reference. [amazon.com] This one is pretty good, but I would not say it is essential. :-)
Beginning Databases with PostgreSQL. [amazon.com] This is one of those Wrox books which is about 10000 pages, including 80% of what I want to know and 2000% of what I don't.
There other others, but I think they are weaker. I was disappointed with the one just called PostgreSQL. [amazon.com]
PostgreSQL Documentation (Score:2, Informative)
(This is also of course online, e.g. the 7.1 version is up at http://www.ca.postgresql.org/users-lounge/docs/7.
So how do we encourage other projects to develop such documentation, which I assume is very time consuming and difficult, as well as being for very little reward (How often do we see posts appreciating new software features compared to those acknowleding the associated documentation?)
The all-important question: upgrading. (Score:2)
When should we do it? Should we wait a couple of weeks for any possible patch/point release, or is this one granite-rock-solid?
Why should we do it? The current DB is working just fine. Are there any compelling new features?
How do we do it? Are the database file binary compatible? Do we need to export and reimport data? Are the RPMs of postgres 7.2 out?
Test it out first (Score:2)
Your applications might break and need modification.
So test it out first - non live.
Even if there are no known issues you should test it out first, you might be unluckily the first one to know you know
Re:The all-important question: upgrading. (Score:2)
A few good features...
- vacuum doesn't need to lock things up all the time, which is definitely nice.
- the speed increase is noticible, especially on a dual P3 SMP.
- there have been a few changes to a number of the datatypes, like the increased resolution on timestamps (now recorded with milliseconds), changes to char and varchar types (they now reject strings longer than their limits -- i.e. a varchar(30) won't accept 31 characters by cutting off the excess).
- you can supply passwords via md5 hashes rather than plaintext or using crypt, which is definitely good, especially for connections made over the wire. You can also supply the passwords via UNIX domain sockets now, which for some reason you couldn't before. (You had to start postmaster with -i and connect via an INET port for that sort of authenication. Ew.)
The databases are not binary compatible, so upgrading from 7.1 to 7.2 will require a pg_dump-from-7.1-initdb-and-dump-into-7.2. Life is rough.
I've been working with it quite a bit and I haven't had any real problems throughout beta testing and the RCs. The "production" site will probably be upgraded later today, or tomorrow.
I have no idea about RPMs, since I always compile from source, but check out freshrpms or rpmfind.net or something.
J
Windows version? (Score:3)
Yes, I know it should be avoided, but I'll ask anyway since sometimes you have no other choice :(
I'm currently using MySQL on a Win2k server and it actually runs pretty okay and is very easy to install. What about PostgreSQL? Last I looked at it - it was a lot more difficult to get running on Windows. Has that changed or are PostgreSQL still more or less *nix only?
Re:Windows version? (Score:2)
Re:Windows version? (Score:2, Insightful)
I myself with no experience of cygwin found it a pain in the ass to get postgres working on windows. There was no quick and easy howto on the postgres site at the time, im still unsure if a quick step by step guide exists today.
I'm sure this puts a lot of people off from initially trying out postgres. Mysql makes it a lot easier to check out on windows, simply run the installation program and its up and running.
It runs under Cygwin!!! (Score:2, Informative)
<bart
Question (Score:2)
Is there any plan to add this or some equivalent? Is it already there and I just missed it? (it's one of the reason I haven't done versions of my open source releases for Postgres)
Wish List (Score:3, Interesting)
1. Point in time recovery
2. Reconstruct SQL from write ahead log
3. Function based indexes with SQL rewrite
4. Materialized views with SQL rewrite
5. Analogue to Oracle's v$sqlarea
6. Wait statistics
7. Tablespaces
8. Inline views (from clause subselects)
9. Parallel query capability
10. Partioned tables
11. Bitmap indexes
12. IO monitoring (read/write per object)
13. Dynamic sort and hash area allocation
14. Detailed SQL tracing (rows per plan step)
15. Multiplexed WAL writes
16. SQL optimizer hints
PostgreSQL issues (Score:2)
Alter table improvements. Changing schemas now with triggers etc is a nightmare. How are others doing this?
I can't figure out how to avoid "" around "every" "single" "identifier" which is annoying.
Continued increased speed and performance. We can do database backed sessions in a mysql damn fast, and it has been tricky to get postgresql to perform as well.
Reclaim space without forcing table locks. The fact is I'm always amazed folks can even lock their tables for an hour. How do you run 24x7 systems with that. The new VAC is a great step.
Improve the website. It is ugly and hard to navigate. Just copy php.net/mysql.com or some other reasonable devloper portal.
Re:Excellent!! (Score:3, Informative)
Keep in mind that when you abbreviate PostgreSQL to "Postgres", you're really talking about a seperate, older product.
I guess it doesn't really matter, since Postgres is long gone, but it still annoys me every time I see it.
Re:Excellent!! (Score:2)
I guess it doesn't really matter, since Postgres is long gone, but it still annoys me every time I see it
To call PostgreSQL a "seperate" product is not entirely acurate. The reason for the name change (first to Postgres95, and later to PostgreSQL), was because they dumped the PostQuel query language in favor of good ol' SQL. Under the hood, it's still the same DBMS (although somewhat enhanced and several versions more mature).
Go to PostgreSQL.org [postgresql.org], and you will discover that in the opening pages of their tutorial is the following line:
So if you have a problem with people saying "postgres" instead of "postgreskew-ell", take it up with the folks writing the documentation.
Re:Excellent!! (Score:2, Informative)
INGRES and POSTGRES were two separate projects at Berkeley that shared no code. PostgreSQL is based on the latter.
See also the POSTGRES FAQ [berkeley.edu].
Re:some might disagree (Score:5, Interesting)
They're not use MySQL to store all their critical data. They're dumping all their data, presumably from some other more reliable database (Oracle, it sounds like), into mysql for quick web searches.
IOW, they're using mysql for what it does best: As a fast datastore for when data integrity isn't important (because they have all the data backed up in Oracle and could redump it to mysql at any time).
Admittedly, some of this post is conjecture, but you'd be crazy to suggest that the Census Bureau would trust all their critical data to mysql.
Re:some might disagree (Score:1)
Re:some might disagree (Score:3, Informative)
Re:Postgres vs. MySQL (Score:2)
Postgres is rock solid, but notoriously slow. MySQL is wicked fast, but doesn't scale well.
Oh puh-lease. That old "postgres is slow" arguement is ancient. While mysql may be faster for simple insert/selects, that's where it stops, and compared to today's postgres it's not that much faster. Update your FUD, please.
Re:Postgres vs. MySQL (Score:2)
So, you agree that postgres is slower, but it's FUD when I say it's slower. Whatever, dude.
Re:Postgres vs. MySQL (Score:2)
So, you agree that postgres is slower, but it's FUD when I say it's slower. Whatever, dude.
Read your comment again:
"Nortoriously slow" is where I have the problem. It's not notoriously slow. MySQL is marginally faster than PostgreSQL on simple insert/select statements, but that's it. So yes, you are spreading fud, dude.
Re:Postgres vs. MySQL (Score:2)
2. The word "notorious" does not mean extreme. It means "well noted". Since even you, who is obviously a passionate defender of all things postgres, acknowledge that MySQL is at least slightly faster, my statement was essentially correct.
Learn what words mean before you start shooting your mouth off.
Re:Postgres vs. MySQL (Score:2)
<sigh>
The acronym "FUD" does not mean "saying bad things about a program". FUD stands for "Fear, Uncertainty and Doubt".
Almost correct. Taken purely as an acronym, that is what FUD means. However the definition has grown to include tactics including half-truths and downright wrong information. Your statements could have been taken as FUD as you were saying that PostgreSQL is notoriously slow. That's just like "Linux is is useless compared to Windows on the desktop," or "MySQL doesn't support transactions." These things may have been true in the past but are not now, so they are considered statements of FUD. Back to my particular statment: you're casting doubt over the performance of the system.
The word "notorious" does not mean extreme. It means "well noted".
That is exactly correct. Your well-noted observation that PostgreSQL is slow is dated and unsubstantiated. Even older versions of PostgreSQL fare much better than MySQL when you start doing more than one update to a table at a time or involve anything but trivial inserts and selects. Earlier versions also had synchronous disk access defaulted on which hurt performance signficantly. Both of these facts are well-noted as well, but you did not bring them up, which to me also helps the FUD claim: you're only telling half the story.
I'm not really a passionate defender of all things Postgres; I just hate seeing the same old tired and dated arguments brought up again and again as they simply don't apply anymore.
I know what the words mean; don't try to cast uncertainty and doubt over my English proficiency, please.
Re:Postgres vs. MySQL (Score:2)
Even had you not conceded that point (which makes everything I said *fact*, rather than *FUD*), I don't really feel like I need you to tell me how spiffy postgres is, because, like I said, I use postgres all the time, and have compared its speed to enough other systems to know how well it performs.
What astonishes me most of all about all your whining about "FUD" was that, when I posted my original message, I actually thought it would be the MySQL fans that would be more likey to get pissed off at me (for calling MySQL a rake to potgres's tractor). The fact that you took that post, which basically said that both DBMS's have their uses, as a "FUD" attack on postgres leads me to the only possible conclusion, that any conversation with you must be like walking on egg shells, because you are too fucking touchy. Take a few breaths, re-read my original post, and you just might realize that it was posted by somebody with no agenda against your favorite program.
Re:Postgres vs. MySQL (Score:2)
You said yourself that postgres is, in fact, slower than MySQL. There's a lot of good reasons why it's slower. An enterprise-level database needs certain features (which MySQL lacks) which will always cause a performance hit.
You're quite correct. However, and as I'm sure you're aware, PostgreSQL is oftentimes faster than MySQL when you're doing anything other than simple inserts and selects. If you'll read my original post again you'll see that I was upset when you didn't justify the "slower" than MySQL; you just blanket-statement said that it was slower, which is often taken as "always."
What I don't understand is why you think I'm the one walking on eggshells, or if I'm trying to convert you to PostgreSQL. All I was doing was calling you to task on your statement. You say you've got all this database experience so I'm sure that we're just arguing over semantics here. (by the way, I do thank you for keeping the argument civilized, it's rare on /.) MySQL is faster than PostgreSQL for the simple stuff. We both agree on that. PostgresSQL is taken more seriously than MySQL because of the features; I think we both agree with this too. And finally, PostgreSQL is faster than MySQL when you are doing any "real" database work (more than just simple inserts and selects on simple tables); I think we both agree on this as well.
As far as my resume goes: so what? I have enough relevant experience to discuss these matters and it seems that we're just arguing over nothing anyway; does a 30-year vertran have anything more important to say than a 3-year programmer? We're not talking about something where vast differences in experience make much difference, if at all.
Re:Great. (Score:2, Interesting)
One of the biggest problems I have with database servers are all the passwords that end up floating around semi-protected, this is a GREAT new feature!
Re:So when should we expect... (Score:3, Interesting)
Done...
I've got 7.1 running on my Titanium Powerbook...
there are really nice MacOS X packages of Postgres at Marc Liyanage's home page [entropy.ch]
I also take this occasion to thank him for the nice MacOS X packages he's put together...
Quentin
Re:Drop column? (Score:2)
Re:Minor annoyance with the release (thus far) (Score:2)
If done right (ie --prefix=/usr/local/pgsql72) you can easily compile, install and have everything ready to go.
Initdb to the
Turn down the webserver to the 7.1 system. Close any applications.
Last but not least:
/usr/local/pgsql72/bin/pg_dumpall -h localhost |
Of course, this takes some time and requires double the disk space. While this is running upgrade any changes to your software that you wish to apply (datatype changes, features being outdated, etc.).
As soon as it's done, shot down both the 71 and 72 databases. Reset the port and socket to the default location for 72 and re-fire them back up. Alternately you can have your application look in a different spot for a connection.
Fire back up the webserver, or re-enable connections to the system. A 4GB database using the above method with a relatively simple structure (100 tables, 50 or so functions, etc) took about 10 minutes of actual downtime.
Of course, several test runs should go before hand. I heavily suggest you enable md5 style passwords at the same time and update the users password storage mechanism -- ODBC doesn't support these yet that I know of.