I have a Vertica DB that is supposed to be used for my "Hot" data. I figured out that I can save my "Cold" data as Parquet files on HDFS as an external table. I also was able to export my data to Parquet file. I have a few questions regarding this process: * Exporting to Parquet - how do I do it often? Do I need to run the queries as cron tab? * When I create an external table, how do I "connect" it to my main table? I mean when running any query, I want Vertica to search also in external tables related to the main table where the "hot" data is stored.
Monica (imported from SE)
If you want to do periodic exports you'll need to use an external scheduler (like cron). You can export to HDFS, NFS, or S3. If you have some data (your hot data) in Vertica and other (cold) data in external tables, those are two separate tables. You can query them together (select... from hot,cold), but your external Parquet data isn't "in" the table that holds your Vertica (ROS, hot) data. When you query an external table, Vertica reads the relevant data from the external location. In a sense, it's doing a lightweight load using the FROM clause of the CREATE EXTERNAL TABLE expression on each query. (The Parquet columnar format and optimizations like predicate pushdown mean that's not painful.) This means that if your FROM clause is a glob (like `hdfs:///data/*/*.parquet`), if you export more data you don't need to update the table definition -- it just works. Note that each export needs to be into a new directory; if you put them all under the same parent directory, then you can use a glob as in my example. See [the documentation](https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Statements/EXPORTTOPARQUET.htm).