UML, PostgreSQL Get Corporate Support 213
tcopeland writes "An article on NewsForge highlights some changes in the upcoming PostgreSQL release (v7.5) that are funded by Fujitsu. PostgreSQL core team member Josh Berkus says that "Tablespaces, Nested Transactions, and Java support" are being underwritten by Fujitsu; this has also been mentioned on the postgresql-hackers list. He also says that 7.5 will be "...the most significant new release of the software since version 7.0 almost four years ago". Good times for PostgreSQL users!" And ggoebel writes "Jeff Dike posted a notice to the UML [User-mode Linux] developers mailing list: 'The first bit of news is that as of last Monday, I am working for Intel. They
generously offered a full-time position, off-site, with my time mostly spent
on UML. This basically means that UML is no longer a part-time, after-hours
thing for me, so we should start seeing more work happening on it, especially
compared to the last month or two.'"
UML is pretty awesome (Score:3, Informative)
clarification please... (Score:3, Informative)
(1) Unified Modeling Language?
or (2) User Mode Linux?
Methinks (2), given that I work alot with (1) and have never heard of Jeff Dike
UML (Score:5, Informative)
UML (Score:2, Informative)
Oh, you meant User-mode Linux? Well, why didn't you say so? Sometimes I think these writeups are intentionally confusing.
Re:Good to Hear... (Score:5, Informative)
Au contraire, there are PHP interfaces for PostgreSQL, Oracle, Sybase, and MSSQL built right in to the source distribution. I seem to recall that back in the Bad Old Days before Mac OS X, when you had to compile things yourself, building PHP with all the necessary libraries was a huge pain, but now it's a trivial thing. Marc Liyanage maintains a PHP module package [entropy.ch] that snaps right into the built-in Apache web server on your Mac, and it already has most of the necessary bells and whistles [entropy.ch] built in.
Good tools out there for PostgreSQL.... (Score:4, Informative)
PLUG: For example, there's this little SQL query analysis [postgresql.org] utility!
Re:Table spaces? (Score:5, Informative)
You are referring to two completely different technologies:
(1) "Writing directly to disk cluster" - By that you seem to mean direct disk access, not through the filesystem. I don't even think this is part of the PostgreSQL TODO, because there is just not a very strong need. Are you experiencing performance problems in this regard?
(2) "fragment tables across spaces" - By that you mean "Table Partitioning". That allows you to break up a single table across multiple storage devices. That would be very valuable technology, but as far as I know, won't make 7.5.
If all these features really work out for 7.5, they should call the release 8.0, and maybe they will.
*: There are some tricks you can use if you need to move a single table to a different device prior to 7.5. I think symlinks work fine, but if it's important, I'd wait for 7.5 or ask on the -general list to make sure it's correct.
Re:Table spaces? (Score:2, Informative)
They also do not have table partioning. It has been discussed and it is a high priority feature but it doesn't seem like anyone has seriously tried to tackle it yet. I'm guessing that it will be on the radar for the next release though.
Tablespaces basically just lets you partition your db across different volumes but a single table cannot be split up.
I am not a developer but this is what I have gleaned from the hackers list.
More servers running PostgreSQL... (Score:3, Informative)
Props to Tim Perdue for picking a solid database on which to build GForge [gforge.org]!
User-Mode Linux Management (Score:5, Informative)
I had a few problems getting it started, but the developers were very helpful.
Re:UML is pretty awesome (Score:1, Informative)
Re:UML is pretty awesome (Score:1, Informative)
Re:Table spaces? (Score:5, Informative)
Strictly speaking, that's not true. You can move things around manually, and some have done so, but it's not pretty, not easy, and not easy to maintain. Implementation of tablespaces in PostgreSQL simply allows its users to easily do what was previously an arcane-voodoo art. So clearly, it's a big step up. But, you already knew that.
"Writing directly to disk cluster" - By that you seem to mean direct disk access, not through the filesystem. I don't even think this is part of the PostgreSQL TODO, because there is just not a very strong need. Are you experiencing performance problems in this regard?
That's correct. AFAIK, there is no desire to implement raw partition support. The speed difference is minimal and the required code is large. Basically, you wind up writing a FS and associated buffer management into the database. The return generally is not very high. It used to be, many years ago. These days, filesystem technology and implementations are plenty fast. Those that want raw partition access, IMO, are simply living in the past.
If all these features really work out for 7.5, they should call the release 8.0, and maybe they will.
You are correct. Accordingly to the list, the numbering constantly goes back and forth. From what i gather, they are waiting to see what features actually make it in. Depending on the scope of changes, they'll then determine the version number. As a rule of thumb, people are calling it 7.5, simply because nothing else has been blessed.
Please don't think I'm correcting what you've said. You've said nothing that I disagree with. I'm simply adding a followup remark.
Cheers!
windows port (Score:3, Informative)
Even though it is currently in beta it works very well. The port is now being downloaded over 2000 times a week and increasing all the time.
Also in PostgreSQL 7.5 - Native Windows Port (Score:4, Informative)
Google didn't exist when user-mode linux started (Score:3, Informative)
message from jeff [iu.edu]
Unified Modelling Language may have existed in early 1998; I first saw it in April 1999. But Unified Modelling Language was a lot smaller back then.
And Google did not exist in February 1998!
These days, when I need to name something, I stick the name in google and check for conflicts.
Re:Table spaces? (Score:5, Informative)
> database files on a RAID and letting the OS split the table across devices?
Sure, you might want to distribute your data across multiple arrays. For example - keep your logs and tempspace on an fast & expensive raid 0+1 array of fast (15k drives). Then put small OLTP stuff on a another raid 0+1 array. Then put your huge graphic images, documents, etc on a much more economical RAID5 array.
I use multiple arrays all the time for performance and economics (in db2 & oracle) - this is cool to see postgres pick itup.
Re:Google didn't exist when user-mode linux starte (Score:3, Informative)
According to this [usc.edu], UML 0.9 was from 1996, UML 1.0 was 1997.
Re:Table spaces? (Score:4, Informative)
However, for larger or more complex systems there are some advantages to splitting tables over multiple disk systems. For example, tables with lots of little niggling disk writes (access tables, change logs, temp tables) can go on a fast (possibly striped) disk system. You don't have to waste high-priced, high performance RAID on archived data (if it crashes, restore from tape), or on large media files etc stored as blobs or clobs.
These are just examples, but on a large server with several different disk sytems available, this technology lets the database designer match storage system performance characteristics much more accurately than a simple raid.
Re:Why corporate self-interest can be good for OSS (Score:3, Informative)
You're right about this being for dedicated postgres boxes, but then dedicated database machines are exactly what you find in large enterprises. The "dot com" I work for has a big iron Sun running Oracle and nothing else, and a large number of smaller machines that do the "everything else". I think you'll find that fairly typical.
Re:GUI Tools (Score:1, Informative)
http://ems-hitech.com/pgmanager/index.phtml
Re:OLAP still missing... (Score:2, Informative)
For the uninitiated, OLAP stands for online analytical processing. In layman's terms, this refers to the process of interactive analysis of data, typically via incremental queries that progressively slice, dice, and refine the data set in order to reveal non-obvious relationships between various parameters.
OLAP is typically performed on data that is of medium-age; i.e., not just current data, as would be found in a typical operational database, but maybe not the full long-term historical data, as would be found, say, in a data mining environment. Of course, different types of data and different application scenarious make such generalizations somewhat problematic, but, generally, OLAP is focused on analysis of, say, the last year or two of data. Regardless, the data sets returned by OLAP queries are typically quite large. As a result, special techniques, distinct from those used for traditional transaction processing, are usually employed in order to meet query response time requirements, which are often key requirements for OLAP systems.
One technique often employed is the use of so-called "star" or "snowflake" schema. This form of schema is quite different from the very normalized schema of transaction processing systems in that the data are organized into central "fact tables" with related dimension tables. Dimensions are things like date, location, product, etc., and have attributes that allow fine-grained querying of the facts in the fact tables. These dimension tables are also constructed in a way that reflects natural hierarchies; e.g., a date dimension would allow queries by year, month, week, day, etc.
While such schema can be defined in traditional transaction processing systems, OLAP-aware database systems typically incorporate design elements that optimize processing of queries on such schemas. OLAP queries are focused on examining aggregates of data across the various dimensions, such as sums, averages, etc. These aggregates may be precomputed on selected chunks of the overall data set to speed up online queries, but the query processor needs to be able to identify opportunities to take advantage of such things. So, optimizing queries for OLAP is a key feature of an OLAP-aware system.
Another feature of an OLAP-capable system is some sort of API for creating the various components needed, e.g., the schema, definitions for any pre-computed aggregates, defining rules for "rolling up" from lower levels of a dimension's hierarchy to higher levels, etc. Oracle's OLAP, for instance, provides several techniques for accessing OLAP data and metadata, but they mostly boil down to either a Java API (high-level) or a more arcane, lower-level API for more direct access. The API(s) available to program an OLAP application can be critical in determining the ease with which applications can be created, and the types of applications that can be created.
Does this help a little?
Re:Why corporate self-interest can be good for OSS (Score:2, Informative)
I take some exception, however, to your view on raw partitions vs. filesystem-based storage. At least in the Oracle world, most studies and expert opinion I have viewed generally recommend against use of raw partitions. With appropriate use of RAID and suitable filesystem selection, the overhead associated with filesystem storage is usually not considered significant, despite many folks's assumptions otherwise. When you consider the difficulties in managing storage over time--e.g., altering tablespace mappings to files, expansion of tablespaces, equalization of I/O--use of filesystems makes such administration much more straighforward. Tom Kyte, a highly-respected technical expert at Oracle, highly recommends against the use of raw partitions unless you just can't stand the 2-3% performance hit.
That said, raw partitions have been required in "Real Application Clusters" (RAC) environments (previously known as Oracle Parallel Server (OPS)), at least until the mainstream acceptance of so-called cluster filesystems. It is my understanding that Oracle's work on clustered filesystems is aimed at allowing RAC systems to enjoy the substantial benefits of filesystem storage.
Re:PostGreSQL needs online backup (Score:1, Informative)
Re:Postgres is kicking butt (Score:3, Informative)
Re:OLAP still missing... (Score:3, Informative)
efeu [cybertec.at]
Re:postgre who? (Score:1, Informative)
Many people who use Postgresql want it to continue to advance, and do NOT want it to become like mysql.
None of these features will make it harder to install or use a basic installation, they are advanced features to allow particular economic requirements, or performance requirements to be met.
Re:GUI Tools (Score:2, Informative)
Actually, you may find pgadmin2 a better choice for now. It has a migration plugin that works wonderfully. ASAIK, this plugin is not yet available for pgadmin3, and doesn't appear to yet be a priority, as it should IMO.
PGAdmin2 is not available for Linux. I can only assume you use Linux since you mentioned pgaccess. I've not heard of a Win port of it, but since it is written in TCL/TK, it would probably be fairly easy to port. PGAdmin2 may even run fine under WINE (not tested)
However, with that said, the former poster was correct, MS Access DOES work very well with postgresql. There are a few problems, but I've always managed to work past them.
LeX