一直在慢慢的摸索clickhouse,之前是用rpm包安装的,没有发现clickhouse-local,最近在centos上面编译成功以后发现多了clickhouse-local,那么这个玩意是什么鬼呢?官方的说法:
原来就是可以直接读取本地的文件进行查询,不用部署clickhouse-server。那么下面进行测试。准备一个测试文本:
[root@db_server_yayun_01 ~]# cat a.txt "yy","18" "bb","20" [root@db_server_yayun_01 ~]#
直接看命令,后面解释
[root@db_server_yayun_01 ~]# clickhouse-local -N test_table --file='a.txt' --input-format=CSV -S "user String, age Int32" -q "SELECT * from test_table FORMAT Pretty" Read 2 rows, 30.00 B in 0.001 sec., 1913 rows/sec., 28.03 KiB/sec. ┏━━━━━━┳━━━━━┓ ┃ user ┃ age ┃ ┡━━━━━━╇━━━━━┩ │ yy │ 18 │ ├──────┼─────┤ │ bb │ 20 │ └──────┴─────┘ [root@db_server_yayun_01 ~]#
可以看见查询出来了。
-N 指定表名,如果不指定默认是table
--file 指定读取的文件
--input-format=CSV,指定读取文件的格式。这里是CSV格式
-S 定义表的字段以及类型
-q 指定查询语句。
更多的参数可以运行clickhouse-local --help
上面是简单的测试,下面来试试官方文档提到的美国民用航空数据。
[root test_clickhouse_data]$ /root/clickhouse-local -N ontime --file='On_Time_On_Time_Performance_2017_1.csv' --input-format=CSV -S "Year String, Quarter String, Month String, DayofMonth String, DayOfWeek String, FlightDate Date, UniqueCarrier FixedString(7), AirlineID String, Carrier FixedString(2), TailNum String, FlightNum String, OriginAirportID String, OriginAirportSeqID String, OriginCityMarketID String, Origin FixedString(5), OriginCityName String, OriginState FixedString(2), OriginStateFips String, OriginStateName String, OriginWac String, DestAirportID String, DestAirportSeqID String, DestCityMarketID String, Dest FixedString(5), DestCityName String, DestState FixedString(2), DestStateFips String, DestStateName String, DestWac String, CRSDepTime String, DepTime String, DepDelay String, DepDelayMinutes String, DepDel15 String, DepartureDelayGroups String, DepTimeBlk String, TaxiOut String, WheelsOff String, WheelsOn String, TaxiIn String, CRSArrTime String, ArrTime String, ArrDelay String, ArrDelayMinutes String, ArrDel15 String, ArrivalDelayGroups String, ArrTimeBlk String, Cancelled String, CancellationCode FixedString(1), Diverted String, CRSElapsedTime String, ActualElapsedTime String, AirTime String, Flights String, Distance String, DistanceGroup String, CarrierDelay String, WeatherDelay String, NASDelay String, SecurityDelay String, LateAircraftDelay String, FirstDepTime String, TotalAddGTime String, LongestAddGTime String, DivAirportLandings String, DivReachedDest String, DivActualElapsedTime String, DivArrDelay String, DivDistance String, Div1Airport String, Div1AirportID String, Div1AirportSeqID String, Div1WheelsOn String, Div1TotalGTime String, Div1LongestGTime String, Div1WheelsOff String, Div1TailNum String, Div2Airport String, Div2AirportID String, Div2AirportSeqID String, Div2WheelsOn String, Div2TotalGTime String, Div2LongestGTime String, Div2WheelsOff String, Div2TailNum String, Div3Airport String, Div3AirportID String, Div3AirportSeqID String, Div3WheelsOn String, Div3TotalGTime String, Div3LongestGTime String, Div3WheelsOff String, Div3TailNum String, Div4Airport String, Div4AirportID String, Div4AirportSeqID String, Div4WheelsOn String, Div4TotalGTime String, Div4LongestGTime String, Div4WheelsOff String, Div4TailNum String, Div5Airport String, Div5AirportID String, Div5AirportSeqID String, Div5WheelsOn String, Div5TotalGTime String, Div5LongestGTime String, Div5WheelsOff String, Div5TailNum String" -q "SELECT DestCityName, uniqExact(OriginCityName) AS u FROM ontime GROUP BY DestCityName ORDER BY u DESC LIMIT 10 FORMAT Pretty" Read 450017 rows, 499.62 MiB in 3.423 sec., 131473 rows/sec., 145.97 MiB/sec. ┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━┓ ┃ DestCityName ┃ u ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━┩ │ Atlanta, GA │ 155 │ ├───────────────────────┼─────┤ │ Chicago, IL │ 139 │ ├───────────────────────┼─────┤ │ Denver, CO │ 118 │ ├───────────────────────┼─────┤ │ Dallas/Fort Worth, TX │ 113 │ ├───────────────────────┼─────┤ │ Minneapolis, MN │ 108 │ ├───────────────────────┼─────┤ │ Houston, TX │ 105 │ ├───────────────────────┼─────┤ │ Detroit, MI │ 99 │ ├───────────────────────┼─────┤ │ Phoenix, AZ │ 83 │ ├───────────────────────┼─────┤ │ Salt Lake City, UT │ 79 │ ├───────────────────────┼─────┤ │ Newark, NJ │ 78 │ └───────────────────────┴─────┘
当然速度肯定比不上导入以后查询的速度。不过还是非常方便。