Vertica Integrating with (Kerberized) Hadoop
一、Vertica 與 Hadoop
Hadoop 與 Vertica 一樣,使用節(jié)點(diǎn)集群進(jìn)行分布式處理。
Vertica 與 Hadoop 集成主要使用 HDFS,即 Hadoop 分布式文件系統(tǒng)。
可以通過多種方式將 Vertica 與 HDFS 結(jié)合使用:
1、使用 HCATALOG 直接訪問 Hive 數(shù)據(jù)庫進(jìn)行數(shù)據(jù)查詢

2、使用外部表訪問 HDFS 數(shù)據(jù)
通過在數(shù)據(jù)庫創(chuàng)建外部表直接訪問 hdfs 的數(shù)據(jù),實(shí)現(xiàn)對數(shù)據(jù)的分層管理,冷數(shù)據(jù)存放在 hdfs,熱數(shù)據(jù)存放在數(shù)據(jù)庫,釋放數(shù)據(jù)庫存儲空間,降低存儲壓力。

3、將 HDFS 作為 Vertica 數(shù)據(jù)庫的 Storage Location
HDFS 可以作為 Vertica 的一個(gè)存儲路徑,由數(shù)據(jù)庫進(jìn)行管理。

4、以列存儲格式導(dǎo)出到 HDFS
將 Vertica 數(shù)據(jù)庫中的數(shù)據(jù)以 Hadoop 列格式(Parquet、ORC)導(dǎo)出到 HDFS 與其他 Hadoop 組件共享。

5、HDFS 作為 EON 模式的共享存儲(Communal Storage)

二、Vertica 集成帶 Kerberos 認(rèn)證的 Hadoop 集群步驟
0 環(huán)境描述
Vertica 集群:v001,v002,v003
Kerberos Server:k001
hdfs namenode:h001
hdfs data node:h002,h003,h004
1 Vertica 集群操作系統(tǒng)配置
分別修改 Vertica 集群3臺服務(wù)器的/etc/hosts文件
vi /etc/hosts 192.168.1.101 v001.hadoop.com v001 192.168.1.102 v002.hadoop.com v002 192.168.1.103 v003.hadoop.com v003 192.168.1.111 k001 192.168.1.121 h001 192.168.1.122 h002 192.168.1.123 h003 192.168.1.124 h004
2 生成票據(jù)
登錄到KDC服務(wù)器,使用kadmin登錄票據(jù)管理添加、導(dǎo)出票據(jù)
# 添加票據(jù)
addprinc -randkey -requires_preauth +allow_forwardable vertica/v001.hadoop.com@HADOOP.COM
addprinc -randkey -requires_preauth +allow_forwardable vertica/v002.hadoop.com@HADOOP.COM
addprinc -randkey -requires_preauth +allow_forwardable vertica/v003.hadoop.com@HADOOP.COM
addprinc -randkey -requires_preauth +allow_forwardable host/v001.hadoop.com@HADOOP.COM
addprinc -randkey -requires_preauth +allow_forwardable host/v002.hadoop.com@HADOOP.COM
addprinc -randkey -requires_preauth +allow_forwardable host/v003.hadoop.com@HADOOP.COM
addprinc -randkey -requires_preauth +allow_forwardable host/v001@HADOOP.COM
addprinc -randkey -requires_preauth +allow_forwardable host/v002@HADOOP.COM
addprinc -randkey -requires_preauth +allow_forwardable host/v003@HADOOP.COM
addprinc -randkey -requires_preauth +allow_forwardable dbadmin@HADOOP.COM
# 導(dǎo)出票據(jù)
xst -k /tmp/vertica.keytab vertica/v001.hadoop.com@HADOOP.COM vertica/v002.hadoop.com@HADOOP.COM vertica/v003.hadoop.com@HADOOP.COM host/v001.hadoop.com host/v002.hadoop.com host/v003.hadoop.com host/v001@HADOOP.COM host/v002@HADOOP.COM host/v003@HADOOP.COM dbadmin@HADOOP.COM
將上面生成的票據(jù)拷貝到Vertica集群3個(gè)節(jié)點(diǎn)的/home/dbadmin/目錄下,且票據(jù)文件的權(quán)限為600,屬主為dbadmin
[root@V001 ~]# cls_run ls -l /home/dbadmin/vertica.keytab -rw------- 1 dbadmin verticadba 5568 Feb 27 18:53 /home/dbadmin/vertica.keytab -rw------- 1 dbadmin verticadba 5568 Feb 27 18:53 /home/dbadmin/vertica.keytab -rw------- 1 dbadmin verticadba 5568 Feb 27 18:53 /home/dbadmin/vertica.keytab
3 數(shù)據(jù)庫配置
1、將Hadoop配置文件hdfs-site.xml和core-site.xml文件存放到Vertica集群所有節(jié)點(diǎn)相同路徑(eg:/opt/hadoopclient/),并保證dbadmin用戶對該路徑有讀寫權(quán)限,然后設(shè)置數(shù)據(jù)庫HadoopConfDir參數(shù):
ALTER DATABASE mydb SET HadoopConfDir = '/opt/hadoopclient/';
2、設(shè)置數(shù)據(jù)庫參數(shù)KerberosKeyTabFile指向票據(jù)文件路徑:
ALTER DATABASE mydb SET KerberosKeytabFile = '/home/dbadmin/vertica.keytab';
所有節(jié)點(diǎn)的票據(jù)文件都存放在同一路徑(例如/home/dbadmin/vertica.keytab),并且dbadmin用戶有權(quán)限訪問該路徑。
3、設(shè)置數(shù)據(jù)庫參數(shù)KerberosServiceName:
ALTER DATABASE mydb SET KerberosServiceName = 'vertica';
4、設(shè)置域名,與Hadoop相同:
ALTER DATABASE mydb SET KerberosRealm = 'HADOOP.COM'
5、設(shè)置數(shù)據(jù)庫參數(shù)KerberosHostname
select set_config_parameter('KerberosHostname', 'v001.hadoop.com');
6、重啟數(shù)據(jù)庫,使參數(shù)生效:
[dbadmin@szxtsp104 ~]$ admintools -t stop_db -d <dbname> -p<dbpassword> -F [dbadmin@szxtsp104 ~]$ admintools -t start_db -d <dbname> -p<dbpassword> -F
4 驗(yàn)證
dbadmin=> select kerberos_config_check();
--------------------------------------------------------------------
ok: kinit exists
ok: klist exists
ok: krb5 exists at [/etc/krb5.conf]
ok: Vertica Keytab file is set to [/home/dbadmin/vertica.keytab]
ok: Vertica Keytab file exists at [/home/dbadmin/vertica.keytab]
Kerberos configuration parameters set in the database
KerberosServiceName : [vertica]
KerberosHostname : [v001.hadoop.com]
KerberosRealm : [HADOOP.COM]
KerberosKeytabFile : [/home/dbadmin/vertica.keytab]
Vertica Principal: [vertica/v001.hadoop.com@HADOOP.COM]
ok: Can read Vertica keys
ok: Can get tickets for Vertica principal
ok: Vertica can kinit
(1 row)
vsql -h v001 -U dbadmin <<- EOF copy public.test from 'webhdfs://h001:25002/user/hive/warehouse/cmdata.db/c04_org_info/stat_dt=20161204/000000_0' orc direct; EOF





