DRBD is a heartbeat based multi-server failover and replication system. It can be used as a simple two server failover solution, or as a clustered high availability platform.
A presentation on DRBD, with an overview of failover solutions, will present concepts and the normal topologies used with DRBD.
Next, a hands-on system build will be performed on a very simple two server network.
Once DRBD is up and running, a few failover scenarios will be presented with the live demonstration system.
This class provides a hands-on approach to high availability. This day long class will focus on configuring multiple high availability scenarios in a lab environment in order to familiarize the individual with the mechanics of:
The day will begin with a 'level-off' overview to get everyone up to speed with general H/A, moving into the technical hands-on:
* Interfacing with the business
* Commercial products
* Technical Hands-on
Each configuration lab will include discussion about the business differentiators of strategy and how it applies to business continuity
Technologies used in lab:
* DB level replication
- Choose: Slony / Londiste / Bucardo
* Streaming Replication
* Active / Passive Failover
- Heartbeat / Linux HA
* If time allows, special requests may be accommodated (if provided ahead of time)
* Computer capable of hosting 2 vm's
(vm will be available for download ahead of time)
* Familiarity with linux OS
* Most demos will involve command-line instructions
* Familiarity with basic PostgreSQL functions (start / stop
database, psql, running scripts)
* Willingness to learn!
This hands-on workshop will present the Ruby programming language, the Rails web application framework, and usage with PostgreSQL. We will spend the morning digging into the Ruby language in order to form a solid foundation for the afternoon session, in which we start building a Rails application backed by PostgreSQL. We will also learn how to manage your database schema, and make use of Postgres specific features from a Rails app.
Knowledge of Object Oriented programming is assumed.
This tutorial starts with an introduction to the most common setups tasks and best practices to get good performance from a PostgreSQL server. And by the end, you'll learn how to tune a PostgreSQL server the only way that matters some days: when the server performance problems are so bad it has crashed, and you have no idea what's wrong. This is a unique chance to see the habits and tools used by someone who faces that problem constantly, watching the repair of a live demo server that's crashed due to overload.
PostgreSQL is a database system that can deliver excellent performance for a wide variety of applications. But getting the performance related aspects of a new PostgreSQL deployment right can be stressful, particularly if you're new to the database and its quirks. The first part of this tutorial goes over the most common things new PostgreSQL installations do that sabotages their potential performance, and how to avoid making some of those mistakes
And when performance drops far enough, it's possible for the server to slow to where it's essentially down, or even crash altogether. Knowing what to do then is just as important, but it's hard to practice for that day.
Greg is going to login to a server that at the beginning is performing terribly. By the end, it will be running smoothly. You get to watch every step of the way, to see what change were made, and what information collected prompted them. One day, you may find your PostgreSQL database server is unresponsive or down due to performance issues. This session gives you a unique opportunity to prepare. Before it happens to you (again?), you can watch how an expert who does this exact type of support and consulting every day responds.
The material covered will introduce you to:
The troubleshooting session will happen on a Linux server, using a mix of command line and web tools. Most of the techniques will also apply to other platforms too. Slides summarizing changes made will be provided, so you can focus more on the demo rather than taking notes.
In this course you will learn the essential details of PostgreSQL Administration. This course is designed for experienced database administrators, but who are new to the PostgreSQL platform. This day long course will also include a 45 minute lab exercise which will focus on new features in PostgreSQL 9.0 including streaming replication.
During training participants will cover the following topics:
A hands-on workshop, focusing on the practical aspects of normalization and what it means to new database designers and programmers. Also suitable for project managers or business analysts that may not be familiar with data design practices.
This mini-tutorial is aimed at PostgreSQL users who want to learn more about PostGIS.
You should be familiar with SQL and the concepts around a relational database. No previous experience with GIS is required.
Some topics we'll cover:
Users are welcome to follow along with their own instance of postgres during the tutorial.
R is the leading free software environment for advanced mathematical and statistical computing and graphics. PL/R is a PostgreSQL extension that allows the use of R from within PostgreSQL for advanced analytics in a simple, efficient, and controlled manner.
This presentation will introduce the audience to PL/R, starting with installation on various OS's, and taking them through basic usage and a few more advanced examples of its power.
This will be a 90 minute Mini-Tutorial on PostgreSQL High Availability.
We'll cover a number of HA building blocks such as Warm/Hot Standby and
connection pooling, then we'll spend some time looking at HA architectures and
specific scenarios. Topics include:
- Warm Standby
- Hot Standby
- PG POOL II
- IP Alias Management
- Putting it all together (HA Architectures & Scenarios)
- Failover Switchover and Reset
- HA Walkthrough of at least one end 2 end solution (with Failover)
- We'll also cover some upcoming technologies:
Project Mogwai can be found at mogwai.sourceforge.net. [I am not affiliated with Project Mogwai.] There are several data related components, but the focus of the mini-tutorial is ERDesigner.
A real-world use case will be presented, involving:
1. Reverse engineering of an existing database.
2. ERD creation and modification.
3. Testing with automatically generated DDL.
4. Provisioning for production needs.
Media will be available with the Java runtime and ERDesigner, so bring your laptop!
The features in PostgreSQL 9 allow replication to systems whose only communication with the master are files sent to it, and you can then run queries against that copy. repmgr is free software to make building read scalable clusters of PostgreSQL nodes using this feature simpler to get started with, and then manage in production.
Once you have multiple nodes in a cluster like this, there are several common concerns that everyone encounters. How do you monitor lag between the servers? How do you gracefully switchover to a new master? And when your master fails, how do you promote one of the standby systems to that role, and then get all of the other standby systems to start talking to it?
The repmgr software is a PostgreSQL add-on that lives outside the database to make these chores easier to do. It automatically computes and reports on lag in human-readable units. It includes a monitoring daemon that tracks node state across the whole cluster. Using a simple command line interface, you can execute complicated operations that touch every node in the cluster all from one place. And the V2.0 release adds unique automatic fail-over capabilities.
This talk will introduce how to setup a cluster with repmgr and what sort of tasks you can expect it to help automate. If you're concerned you'll find yourself in a difficult situation where you need to make decisions like "which node has the most recent copy of my data now that the master has failed?" and then shuffle around the configuration of every remaining node, you'll have just the right tool needed for that type of problem. You'll also learn how replication monitoring has changed in PostgreSQL 9.1, and how repmgr fits into the newer toolkit provided by that version.
pg_reorg is a simple but very handy tool that allows to reorganize tables without downtime.
It helps to eliminate bloat, reduce IO, improve performance, delete large portion of a big table.
It may be especially handy in large sharded and replicated database environments.
Unfortunately, not too many people are familiar with the tool. This talk demonstrates how this tool may be used for making database administration a little easier.
vFabric Postgres is a new offering from VMware. vFabric Postgres is available either as deployed by VMware vFabric Data Director or as a service on CloudFoundry.com.
This talk is bout using vFabric Postgres from a user's perspective.
Solid-state technologies are changing the way that database administrators approach their Postgres platform. With solid-state, DBAs have been able to reduce their reliance on disk-based systems and instead create infrastructures that can sustainably meet their performance and reliability needs.
This presentation will focus on current implementation and usage of solid-state in the datacenter; Choosing a redundancy plan; Database Mirroring, Replication and third party solutions; and how solid-state technologies are being implemented to see significant performance improvements. The presentation will also discuss testing strategies for solid-state technologies and making sure your system is configured for optimal performance based on your requirements.
Video Disclaimer: YesAdditional Comments: This talk will be presented by Sumeet Bansal, Principal Solutions Architect for Fusion-io. His bio is listed below. Sumeet Bansal is the Principal Solutions Architect for Fusion-io. He advises on creating architectural solutions that use Fusion-io technology to blend, performance, scalability, and simplicity. Sumeet has more than 11 years of IT and database infrastructure experience, including the most recent work as vice-president of IT at the Giant Internet Retailer, wine.com. He has extensive experience with MS SQL Server.
The key asset to many businesses is the data they store. So why do most programmers opt to implement their data integrity outside of the schema? Using concepts and methodologies from The Schemaverse, this session will show how you can benefit from creating self-validating databases, and discuss what limitations exist (real or preceived). Audience participation will be encouraged during the session.
My co-presenter for this session will be Michael Perklin.
VMware vFabric Data Director is a self-service database provisioning and operations platform that enables organizations to offer database-as-a-service (DBaaS) inside their datacenters. Coupled with VMware vSphere, the best platform for cloud infrastructures, vFabric Data Director fundamentally simplifies database deployment, operations, and scaling in cloud environments and in turn, dramatically increases developer agility through self-service database management, improves datacenter efficiency by fully extending virtual infrastructure benefits to the database tier. PostgreSQL is the first database enabled on Data Director. This session will give you an in-depth look at vFabric Data Director with PostgreSQL to help you understand how you can stand up and operate DBaaS with PostgreSQL for your cloud.
Stado, formerly know as GridSQL, provides a powerful and flexible analytical environment allowing users to process large amounts of data using a shared-nothing, massively parallel processing (MPP) architecture with PostgreSQL and PostGIS. Data is automatically partitioned across multiple nodes and each node processes its subset of data allowing queries to be distributed across the cluster and run in parallel. This fully open source architecture allows database performance to scale linearly as servers are added to the cluster while appearing as a single database to applications.
This presentation will demonstrate the 10-20x scalability and performance gains of PostgreSQL queries running in a Stado environment compared to a single PostgreSQL instance. Some intensive PostGIS queries will be the basis for the discussion. We will dig into how Stado plans a query capable of spanning multiple PostgreSQL servers and executes across those nodes using the Tiger data set as an example.
This talk is an introduction to Sequel, the ruby database library. Sequel provides a simple, powerful, and flexible API for interacting with SQL databases. Sequel allows you to write all your SQL code by hand if you want, but it also allows you to use ruby objects to represent SQL concepts, so that you rarely need to. Sequel includes a comprehensive and extendible ORM with extensive support for manipulating associated objects. Sequel supports advanced database features such as prepared statements, bound variables, savepoints, two-phase commit, transaction isolation, common table expressions, window functions, master/slave configurations, and database sharding.
BIRT is an open source Eclipse-based reporting system that integrates with Java/J2EE applications to produce compelling reports. The purpose of this talk is to show how easy it is to get BIRT up and running with PostgreSQL databases as well as how to create several simple reports that are graphically stunning.
vFabric Postgres (vPostgres) is a new offering from VMware, optimizing
PostgreSQL for virtualized platforms. We will describe our
improvements over stock PostgreSQL, such as database ballooning,
adaptive tuning of checkpoint parameters, and high availability
features. We will also talk about our backup/restore infrastructure
that combines PostgreSQL and VMware vSphere features to provide
Point-In-Time Recovery and Snapshot capabilities for managed
This presentation explains the unique demands on I/O and memory that distinguish a database server from an application server. Topic areas include RAID Levels, SSD, BBU and some real world AWS cloud experiences.
I have developed a queueing library for Ruby that uses PostgreSQL as the backend. With the combination of row-level locking, PUB/SUB and PL/pgSQL, PostgreSQL was a perfect fit for this database backed queue. In the talk, I will highlight the motivations and pitfalls I discovered while building and maintaining Queue Classic. I will also discuss the environments in which the Queue Classic is running. Queue Classic supports a variety of components: runtime, database provisioning & financial services.
While databases are increasingly being distributed across multiple nodes, the performance of every node still matters--especially if you're considering virtualized or cloud deployments that have their own specific trade-offs. Memory performance scaling as core count changes, all aspects of disk performance, and using pgbench to test PostgreSQL are all topics covered here.
The increasing use of virtual hardware and multi-node clusters makes the performance of individual nodes seem less relevant than it used to be. But databases have unique consistency and workload requirements that still make hardware performance and choice critical. And particularly with the complexity of today's hardware, running your own bottom-up benchmarks is vital for defending against the very common hardware vendor errors you should expect to happen to you eventually, not be surprised by. A large percentage of new server hardware ships with at least one major performance issue. Deploy a production server without making sure the hardware itself performs as expected, and troubleshooting the resulting problems is vastly more difficult.
Oh, and don't forget one more thing--the information you need to know to make informed hardware changes changes constantly. This session includes the speaker's latest unique research into database performance tuning, which includes a variety of custom-designed tools aimed at this area. The "stream-scaling" project provides a new way to visualize how memory performance scales with the number of active cores involved in the workload, a vital component to quantify on today's many core designs when running a database server. stream-scaling highlights the speed vs. capacity trade-offs in server memory you must be aware of to size the latest server hardware appropriately. The recently developed "seek-scaling" uses the same approach to reinvent measuring disk speed for databases much larger than the RAM in the server. And the benchmarking automation set of "pgbench-tools" is now easier than ever to use for comparing the performance of different systems or server configurations.
There are cases where a single DBMS vendor is not enough. In this talk we'll analyze how to painlessly solve the following real-world needs with an open source Tungsten Replicator: real-time integration, seamless migration, data warehousing and leveraging the power of a specific DBMS SQL dialect. Not only we will learn how to move data from MySQL to PostgreSQL in real time (yes, let's free those MySQL petabytes to PG users!), but we'll also discuss a fresh and important way of extracting data from PostgreSQL and making it available to other DBMSes. We’ll conclude with a demo of database updates moving in real time between different databases.
We'll have a look at backup and recovery methods for PostgreSQL
- PostgreSQL pg_dump
- PostgreSQL pg_dumpall
- Restoring pg_dump files
- Restoring pg_dumpall files
- Customizing a restore via pg_dumpall TOC files
- PITR (Point In Time Recovery) Backups
- The PITR Recovery process
Whether building a product, or adding to your infrastructure, these days the use of Free and Open Source Software (FOSS) components is a natural and sensible choice. However, for any given cog in your machine, there are potentially many FOSS packages that fit the bill - how do you know which one to use?
Any significant software in your technology stack deserves a due diligence selection process. When open source software is involved, there are considerations both familiar and unfamiliar to many who are new to open source.
This presentation covers 10 criteria which should be considered as part of your decision. The concepts are illustrated through an example selection process in which a data persistence layer is being chosen for a web application. The potential candidates are PostgreSQL, MySQL, and Redis.
Database access from small-footprint hardware is becoming a mainstream problem for application developers. Given the wide disparity in platform capabilities, how can you make the best use of Postgresql from a mobile platform?
1. Overview of J2ME/CDC/MIDP architectures
2. The 'normal' answer to database connectivity
4. More than web services
5. Wrap - up / Q&A
Using PostgreSQL and PostGIS allows you to access and modify your spatial data using SQL. This talk will provide an overview on the advantages to using PostGIS, how to properly store and query your geographic data, and offer some solutions to common problems.
This talk will comprise:
While Bucardo is often known for being one of the few multi-master replication systems available for PostgreSQL, there are also many other uses for it as a general tool for:
* multi-master configuration
* live/minimal downtime database upgrades
* live/minimal server migrations
* data aggregator
Features of Bucardo will be examined, as well as an architectural overview. We will also review its real-life use in a live migration from an existing multi-node Slony/PostgreSQL 8.2 cluster in one data center to a PostgreSQL 9.0/Hot Standby cluster in another with minutes of downtime. Comparisons will be made between Bucardo's approach and Slony's, and brief coverage of the options these differences open up will be made.
No existing familiarity with Bucardo is necessary, but knowledge of various existing replication systems and concepts will be a plus.
We will also cover the upcoming Bucardo 5.
Designed for the non-DBA, Dumb Simple PostgreSQL Performance provides down to earth performance advice for users of the database who don't want to bother with the finer details usually assigned to a Database Administrator.
The people who will get the most out of this talk are application developers, web developers and system administrators who have somehow also become a DBA.
The PostgreSQL database ships out of the box with a wide variety of unique features, including powerful type system, a built-in key-value store, full text search, cryptography libraries, and a huge tool chest of available contrib modules for all kinds of different tasks.
Unfortunately, many Ruby developers have been locked in a box by less powerful data stores and while they have been leaning on PostgreSQL's rock-solid storage engine, they have constrained themselves to the lowest-common-denominator features.
In this talk, I want to encourage Ruby developers to reach beyond their usual set of tools, and I will talk about some of the long-standing features you can take advantage of today. We'll also talk about some of the features which aren't as well supported yet, and look at how we as a community can work together to close that gap and bring the full power of PostgreSQL to bear on our problems.
Use knowledge of database architecture to clarify some of the discussion around NoSQL and ORMs, avoid wasteful designs, find common ground between various database systems, and choose the right tools for development.
What parts of PostgreSQL are relational? Which piece guarantees ACID semantics? How much of PostgreSQL do you need just to do a simple CRUD application? Why is it so challenging to distribute a database system across many nodes? Are you reinventing a tool that already exists? Conversely, are you paying (in development time, administration overhead, execution time, etc.) for features you don't need? What if the SQL system that I use does not fit this architecture (e.g. SQLite) -- what does that mean? What if the NoSQL system I use started out looking radically new, but is looking more and more like SQL every day?
In addition to answering these questions, this talk will help you help you ask similar questions at the right time (that is, before you write the code), and detect dubious designs that are redundant and wasteful.
When you give your database server memory, you expect it's going to use it. But for what? A look inside PostgreSQL's buffer cache can tell you exactly what that memory is doing for you. Every systematic database tuning effort should include a look at this critical resource. When it comes to optimization work, profiling beats guessing every time. And understanding the buffer cache lets you tune complicated parts of the database like checkpoint writes in a much more robust way.
PostgreSQL keeps most of its working data inside a block of shared memory allocated when the server starts, used for caching disk reads and writes. Looking at the contents of that cache can give you valuable clues to how your database application really works. The best ways to handle many types of optimization tasks involve carefully measuring the variables you're changing, but most people change the size of this cache without any plan for measuring the impact. Information about data moving in and out of the cache is useful for performance tuning, query optimization, system monitoring, and even predicting the future requirements of your system.
This presentation aims to describe the basics of how the cache is organized, how to query its contents, and how to interpret the results of those queries. By monitoring what goes in and out of the cache, you get a unique window into what's really happening inside your database when it's running your application.
This talk will discuss the impacts and implementation of fast, automated provisioning of replicants -- both as writable 'forked' copies and read-only streaming replica 'followers' of PostgreSQL databases, as implemented at Heroku on a large scale as a service.
Included will be a brief summary of use cases, anecdotes, and challenges, as well as new trade-offs in the design of applications and partitioning of data. In addition, the technical architecture behind such a service and how it unifies rock-solid, low-impact disaster recovery, replication, and database-forking will be covered.
If you don't know what a race condition is, or you thought that SQL was immune from race conditions, this talk is for you. These are bugs that slip through testing, are hard to track down, and might cause invalid data in your database.
The good news is that 9.1 offers a cure: true serializability through Serialized Snapshot Isolation (SSI). It's easy to use and magically detects a live SQL race condition and turns it into an error.
Oh, did I say "magic"? It's actually technology developed by Kevin Grittner and Dan R.K. Ports, based on some fresh research by Michael J. Cahill, et al.
Of course, there is a catch: because race conditions may turn into errors, application developers need to be prepared to retry transactions when that happens. This is usually a minor burden, however, because all you need to do is send the same transaction again.
TransLattice has developed new technology allowing geographically distributed, parallel databases. Starting with PostgreSQL technology and the open-source Postgres-R synchronous multimaster replication system, Translattice has extended the relational database with transparent horizontal sharding and redundancy mechanisms.
In this talk, TransLattice’s use of PostgreSQL will be discussed, along with TransLattice’s open source strategy. A brief overview of the associated replication, data partitioning, and coherency approaches will also be given within the TransLattice Application Platform.
The audience will gain insights regarding:
• Horizontally scaling deployments of RDBMS and PostgreSQL
• The Postgres-R synchronous multi-master replication system
• TransLattice’s approaches within the PostgreSQL ecosystem.
As more information is collected, displaying it to end users is becoming increasingly difficult. Through the use of AJAX and HTML5, developers now have a canvas to present that information in a rich web based client. In this presentation, you will learn how the Google Web Toolkit can visualize the data locked inside a PostgreSQL database in a interactive web environment. We will show code examples as well as a live demo.
A 45 min talk on database editing and verification
with CFEngine, a tool for automating system administration. CFEngine
Nova (the commercial edition) supports system administration promises
about database state: that certain tables exist and have certain
schemas. I will demonstrate how this works.
Powerful datatypes such as hstore and json give PostgreSQL the most compelling feature of many NoSQL databases, which is management of semi structured data. Unfortunately the popular Ruby ORMs try very hard to equalize all databases, so these and other features often go unnoticed or underappreciated.
While there are some Ruby libraries providing application-level bindings for these features, they are generally new or otherwise not well-known. Developers are also often further discouraged by having to sacrifice the benefits of their ORM to use these libraries. Fortunately, the situation is starting to improve.
We will go into detail about what was needed to bring hstore support to Ruby, survey the current PostgreSQL and Ruby landscape, and discuss what needs to happen in the near future to improve their relationship.
Redis is an open source, high performance networked key value store that provides a rich set of features while staying simple and fast.
This talk introduces data types and operations available in Redis, and how it can be used alongside PostgreSQL to design high performance, write heavy applications that stay up.
Discussion will be based on our own experience handling a sustained 100 thousand writes a minute, and our use of Redis for caching, indexing, throttling and beyond.
Having difficulty translating complex business inquiries into SQL? Want an instant web interface for your relational database? Up for some tasty dashboards? Come hear about HTSQL.
HTSQL is a URI-based high-level query language / web service for PostgreSQL designed especially for creating interactive dashboards and answering complex business inquiries. We use HTSQL as a REST query interface for data integration and ad-hoc reporting by technical users or ``accidental programmers``.
Application deployment has become increasingly complicated with today’s performance, scalability, reliability, and compliance requirements. Technologies, such as utility computing in the cloud, can help reduce costs but also introduce new challenges, including meeting application database and data storage requirements.
This talk includes a comparison of scaling and redundancy models. Attendees will learn how to maximize application resilience and performance while containing costs. In the current relational model, operations staff deals with the complexity of where data is located and how to scale up. In the do-it-yourself/big data model, the development staff does. New architectures are emerging that hand the responsibility to the infrastructure itself.
The audience will gain insights regarding:
• Realizing the benefits of Cloud Computing
o Scalability (Elasticity)
o Availability and Protection Against Data Loss
• Deployment models – Scale Up vs. Scale Out
• Conventional Transactional Infrastructure
• Parallel Databases for Analytics
• Specialized Architectures for Large Scale: NoSQL and Friends
• Emerging Efforts at Horizontal Scalability of Enterprise Application Stacks
The Schemaverse is a space-based strategy game implemented entirely within a PostgreSQL database. If you've ever wanted to use SQL to conquer the world, now you can do one better and conquer a universe. This talk will introduce you to the game mechanics and discuss the project's goals, growth and development.
A great overview of what quite possibly is the most important utility in the PostgreSQL arsenal, pg_dump and it's sister pg_restore. We will discuss overall usage, best practices as well as features not normally used (but should be) such as the toc file.
Compute bound PostgreSQL Data Analytics and Data Mining queries can exhibit poor performance; as PostgreSQLs’ “per process model” constrains a query to a single CPU Core. With PgOpenCL Time Series, a new data type, compute intensive queries are parallelized to take advantage of all CPU cores. If a discrete GPU, or AMD Fusion GPU, is available, queries can take advantage of the additional 100’s to 1000’s of cores for further speed ups.
The Time Series data types are specialized data types that represent a sequence of data points measured typically at successive times spaced at uniform time intervals. Some typical examples are Wall Street closing prices, climate data and digitized signals. The PgOpenCL Time Series data types utilize native OpenCL data types to take maximum advantage of GPU hardware.
PgOpenCL is a new PostgeSQL Stored Procedure language that implements Open Computer Language (OpenCL) for Parallel Processing on Multi-Core and Many-Core systems. OpenCL is freely available on Windows and Linux systems.
Multi-version concurrency control is a well-known and established key technology for high performance database systems. Many open source projects, as well as commercial products, offer various successful implementations of MVCC. However, its potential for clustering solutions is still underestimated. TransLattice Application Platform (TAP) leverages MVCC to be used in an automatically sharded, multi-master cluster.
The audience will gain insights into:
The key features of Postgres-R and TAP, including:
• The concurrent, optimistic application of change sets
• Conflict detection during normal operation
• Distributed cross-shard operations
• Runtime healing in a multi-master environment