使用Docker下的Go-mysql-ElasticSearch把MySQL数据表同步到ElasticSearch

是一个自动同步MySQL数据到Elasticsearch服务,它使用mysqldump同步全量数据, 然后通过binlog同步增量数据
项目地址https://github.com/siddontang/go-mysql-elasticsearch

Go-mysql-ElasticSearch项目自带个Dockerfile,可以直接用Docker run运行, 只需要简单配置即可.

安装配置MYSQL

MYSQL需要配置成主从结构

server-id=1
log-bin=mysql-bin.log
binlog-do-db=ultron
binlog-do-db=ultron_test
expire_logs_days=1
binlog_format=row

安装ELK

参考http://www.heoffice.com/docker/185/

Docker运行Go-mysql-ElasticSearch

可以使用Docker run直接运行, 但我喜欢用Docker-compose运行.docker-compose运行的好处有
* 可以设置配置文件./etc/river.toml路径为相对路径
* 可以方便集成到其它docker-compose中,与其它容器通过容器名相互访问数据
* 运行简单: docker-compose up或docker-compose -f xxx.yml up

git clone git@github.com:siddontang/go-mysql-elasticsearch.git
cd go-mysql-elasticsearch

创建docker-compose.yml

version: '3'
services:
    go-mysql-elasticsearch:
        build: .
        restart: always
        ports:
            - 12800:12800
        volumes:
            - "./etc/river.toml:/go/etc/river.toml"

配置同步配置文件 etc/river.toml

# MySQL address, user and password
# user must have replication privilege in MySQL.
my_addr = "192.168.2.119:3306"
my_user = "ultron"
my_pass = "ultron"
my_charset = "utf8mb4"

# Set true when elasticsearch use https
#es_https = false
# Elasticsearch address
es_addr = "192.168.2.119:9200"
# Elasticsearch user and password, maybe set by shield, nginx, or x-pack
es_user = ""
es_pass = ""

# Path to store data, like master.info, if not set or empty,
# we must use this to support breakpoint resume syncing. 
# TODO: support other storage, like etcd. 
data_dir = "./var"

# Inner Http status address
stat_addr = "127.0.0.1:12800"

# pseudo server id like a slave 
server_id = 1001

# mysql or mariadb
flavor = "mysql"

# mysqldump execution path
# if not set or empty, ignore mysqldump.
mysqldump = "mysqldump"

# if we have no privilege to use mysqldump with --master-data,
# we must skip it.
#skip_master_data = false

# minimal items to be inserted in one bulk
bulk_size = 128

# force flush the pending requests if we don't have enough items >= bulk_size
flush_bulk_time = "200ms"

# Ignore table without primary key
skip_no_pk_table = false

# MySQL data source
[[source]]
schema = "ultron"

# Only below tables will be synced into Elasticsearch.
# "t_[0-9]{4}" is a wildcard table format, you can use it if you have many sub tables, like table_0000 - table_1023
# I don't think it is necessary to sync all tables in a database.
tables = ["advertiser"]

[[source]]
schema = "ultron_test"

# Only below tables will be synced into Elasticsearch.
# "t_[0-9]{4}" is a wildcard table format, you can use it if you have many sub tables, like table_0000 - table_1023
# I don't think it is necessary to sync all tables in a database.
tables = ["advertiser","offer"]

运行:

docker-compose up

运行效果

在kibana的management界面,在创建index中能看到要同步的advertiser和offer, 表示成功

注意

在执行docker-compose up时,如果遇到

go-mysql-elasticsearch_1  | /usr/bin/mysqldump: unknown variable 'set-gtid-purged=OFF'

错误, 就转到源代码/go/dump/dump.go 中找到

args = append(args, "--set-gtid-purged=OFF")

把它注释掉,重新执行docker-compose up –build

参考

https://github.com/siddontang/go-mysql-elasticsearch