sphinx
网址:http://sphinxsearch.com/downloads/current/
点击最新版下载
输入 wget 将刚才复制的链接在后面加上 wget http://sphinxsearch.com/filessphinx-2.2.11-release.tar.gz 回车,等待下载
或者将安装包下载下来再上传到linux系统中也可以
步骤:
下载:
wget http://sphinxsearch.com/filessphinx-2.2.11-release.tar.gz
解压:
tar zxvf wget sphinx-2.2.11-release.tar.gz
进入文件夹:
cd sphinx-2.2.11-release
编译:
./configure --prefix=/usr/local/sphinx --with-mysql
可能会报下面的错误:
解决方法:
vi confifure
#define USE_LIBICONV 0 in line 8179.
修改 configure 文件把 #define USE_LIBICONV 0 最后的数值由1改为0
重新编译。
最后显示
表明已经安装成功
cd /usr/local/sphinx/etc
vi sphinx.conf.dist
将sphinx.conf.dist文件中的参数修改成下面的
#
# Sphinx configuration file sample
#
# WARNING! while this sample file mentions all available options,
# it contains (very) short helper descriptions only. Please refer to
# doc/sphinx.html for details.
#
#############################################################################
## data source definition
#############################################################################
source src1
{
# data source type. mandatory, no default value
# known types are mysql, pgsql, mssql, xmlpipe, xmlpipe2, odbc
type = mysql
#####################################################################
## SQL settings (for 'mysql' and 'pgsql' types)
#####################################################################
# some straightforward parameters for SQL source types
sql_host = localhost
sql_user = root
sql_pass = 2018
sql_db = tpblog
sql_port = 3306 # optional, default is 3306
# UNIX socket name
# optional, default is empty (reuse client library defaults)
# usually '/var/lib/mysql/mysql.sock' on Linux
# usually '/tmp/mysql.sock' on FreeBSD
#
sql_sock = /tmp/mysql.sock
# MySQL specific client connection flags
# optional, default is 0
#
# mysql_connect_flags = 32 # enable compression
# MySQL specific SSL certificate settings
# optional, defaults are empty
#
# mysql_ssl_cert = /etc/ssl/client-cert.pem
# mysql_ssl_key = /etc/ssl/client-key.pem
# mysql_ssl_ca = /etc/ssl/cacert.pem
# MS SQL specific windows authentication mode flag
# MUST be in sync with charset_type index-level setting
# optional, default is 0
#
# mssql_winauth = 1 # use currently logged on user credentials
# ODBC specific DSN (data source name)
# mandatory for odbc source type, no default value
#
# odbc_dsn = DBQ=C:\data;DefaultDir=C:\data;Driver={Microsoft Text Driver (*.txt; *.csv)};
# sql_query = SELECT id, data FROM documents.csv
# ODBC and MS SQL specific, per-column buffer sizes
# optional, default is auto-detect
#
# sql_column_buffers = content=12M, comments=1M
# pre-query, executed before the main fetch query
# multi-value, optional, default is empty list of queries
#
sql_query_pre = SET NAMES utf8
sql_query_pre = SET session query_cache_type=OFF
# main document fetch query
# mandatory, integer document ID field MUST be the first selected column
sql_query = select des,des as attr_des,keywords,keywords as attr_keywords,content,content as attr_content from tpblog_arcdata
# joined/payload field fetch query
# joined fields let you avoid (slow) JOIN and GROUP_CONCAT
# payload fields let you attach custom per-keyword values (eg. for ranking)
#
# syntax is FIELD-NAME 'from' ( 'query' | 'payload-query' ); QUERY
# joined field QUERY should return 2 columns (docid, text)
# payload field QUERY should return 3 columns (docid, keyword, weight)
#
# requires that query results are in ascending document ID order!
# multi-value, optional, default is empty list of queries
#
# sql_joined_field = tags from query; SELECT docid, CONCAT('tag',tagid) FROM tags ORDER BY docid ASC
# sql_joined_field = wtags from payload-query; SELECT docid, tag, tagweight FROM tags ORDER BY docid ASC
# file based field declaration
#
# content of this field is treated as a file name
# and the file gets loaded and indexed in place of a field
#
# max file size is limited by max_file_field_buffer indexer setting
# file IO ERRORs are non-fatal and get reported as warnings
#
# sql_file_field = content_file_path
# range query setup, query that must return min and max ID values
# optional, default is empty
#
# sql_query will need to reference $start and $end boundaries
# if using ranged query:
#
# sql_query = \
# SELECT doc.id, doc.id AS group, doc.title, doc.data \
# FROM documents doc \
# WHERE id>=$start AND id<=$end
#
# sql_query_range = SELECT MIN(id),MAX(id) FROM documents
# range query step
# optional, default is 1024
#
# sql_range_step = 1000
# unsigned integer attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# optional bit size can be specified, default is 32
#
# sql_attr_uint = author_id
# sql_attr_uint = forum_id:9 # 9 bits for forum_id
sql_attr_uint = article_aid
# boolean attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# equivalent to sql_attr_uint with 1-bit size
#
# sql_attr_bool = is_deleted
# bigint attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# declares a signed (unlike uint!) 64-bit attribute
#
# sql_attr_bigint = my_bigint_id
# UNIX timestamp attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# similar to integer, but can also be used in date functions
#
# sql_attr_timestamp = posted_ts
# sql_attr_timestamp = last_edited_ts
# sql_attr_timestamp = date_added
# floating point attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# values are stored in single precision, 32-bit IEEE 754 format
#
# sql_attr_float = lat_radians
# sql_attr_float = long_radians
# multi-valued attribute (MVA) attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# MVA values are variable length lists of unsigned 32-bit integers
#
# syntax is ATTR-TYPE ATTR-NAME 'from' SOURCE-TYPE [;QUERY] [;RANGE-QUERY]
# ATTR-TYPE is 'uint' or 'timestamp'
# SOURCE-TYPE is 'field', 'query', or 'ranged-query'
# QUERY is SQL query used to fetch all ( docid, attrvalue ) pairs
# RANGE-QUERY is SQL query used to fetch min and max ID values, similar to 'sql_query_range'
#
# sql_attr_multi = uint tag from query; SELECT docid, tagid FROM tags
# sql_attr_multi = uint tag from ranged-query; \
# SELECT docid, tagid FROM tags WHERE id>=$start AND id<=$end; \
# SELECT MIN(docid), MAX(docid) FROM tags
# string attribute declaration
# multi-value (an arbitrary number of these is allowed), optional
# lets you store and retrieve strings
#
sql_attr_string = attr_des
sql_attr_string = attr_keywords
sql_attr_string = attr_content
# JSON attribute declaration
# multi-value (an arbitrary number of these is allowed), optional
# lets you store a JSON document as an (in-memory) attribute for later use
#
# sql_attr_json = properties
# combined field plus attribute declaration (from a single column)
# stores column as an attribute, but also indexes it as a full-text field
#
# sql_field_string = author
# post-query, executed on sql_query completion
# optional, default is empty
#
# sql_query_post =
# post-index-query, executed on successful indexing completion
# optional, default is empty
# $maxid expands to max document ID actually fetched from DB
#
# sql_query_post_index = REPLACE INTO counters ( id, val ) \
# VALUES ( 'max_indexed_id', $maxid )
# ranged query throttling, in milliseconds
# optional, default is 0 which means no delay
# enforces given delay before each query step
sql_ranged_throttle = 0
# kill-list query, fetches the document IDs for kill-list
# k-list will suppress matches from preceding indexes in the same query
# optional, default is empty
#
# sql_query_killlist = SELECT id FROM documents WHERE edited>=@last_reindex
# columns to unpack on indexer side when indexing
# multi-value, optional, default is empty list
#
# unpack_zlib = zlib_column
# unpack_mysqlcompress = compressed_column
# unpack_mysqlcompress = compressed_column_2
# maximum unpacked length allowed in MySQL COMPRESS() unpacker
# optional, default is 16M
#
# unpack_mysqlcompress_maxsize = 16M
# hook command to run when SQL connection succeeds
# optional, default value is empty (do nothing)
#
# hook_connect = bash sql_connect.sh
# hook command to run after (any) SQL range query
# it may print out "Minid maxid" (w/o quotes) to override the range
# optional, default value is empty (do nothing)
#
# hook_query_range = bash sql_query_range.sh
# hook command to run on successful indexing completion
# $maxid expands to max document ID actually fetched from DB
# optional, default value is empty (do nothing)
#
# hook_post_index = bash sql_post_index.sh $maxid
#####################################################################
## xmlpipe2 settings
#####################################################################
# type = xmlpipe
# shell command to invoke xmlpipe stream producer
# mandatory
#
# xmlpipe_command = cat /usr/local/sphinx/var/test.xml
# xmlpipe2 field declaration
# multi-value, optional, default is empty
#
# xmlpipe_field = subject
# xmlpipe_field = content
# xmlpipe2 attribute declaration
# multi-value, optional, default is empty
# all xmlpipe_attr_XXX options are fully similar to sql_attr_XXX
# examples:
#
# xmlpipe_attr_timestamp = published
# xmlpipe_attr_uint = author_id
# xmlpipe_attr_bool = is_enabled
# xmlpipe_attr_float = latitude
# xmlpipe_attr_bigint = guid
# xmlpipe_attr_multi = tags
# xmlpipe_attr_multi_64 = tags64
# xmlpipe_attr_string = title
# xmlpipe_attr_json = extra_data
# xmlpipe_field_string = content
# perform UTF-8 validation, and filter out incorrect codes
# avoids XML parser choking on non-UTF-8 documents
# optional, default is 0
#
# xmlpipe_fixup_utf8 = 1
}
# inherited source example
#
# all the parameters are copied from the parent source,
# and may then be overridden in this source definition
#source src1throttled : src1
#{
# sql_ranged_throttle = 100
#}
#############################################################################
## index definition
#############################################################################
# local index example
#
# this is an index which is stored locally in the filesystem
#
# all indexing-time options (such as morphology and charsets)
# are configured per local index
index test1
{
# index type
# optional, default is 'plain'
# known values are 'plain', 'distributed', and 'rt' (see samples below)
# type = plain
# document source(s) to index
# multi-value, mandatory
# document IDs must be globally unique across all sources
source = src1
# index files path and file name, without extension
# mandatory, path must be writable, extensions will be auto-APPended
path = /usr/local/sphinx/var/data/test1
# document attribute values (docinfo) storage mode
# optional, default is 'extern'
# known values are 'none', 'extern' and 'inline'
docinfo = extern
# dictionary type, 'crc' or 'keywords'
# crc is faster to index when no substring/wildcards searches are needed
# crc with substrings might be faster to search but is much slower to index
# (because all substrings are pre-extracted as inpidual keywords)
# keywords is much faster to index with substrings, and index is much (3-10x) smaller
# keywords supports wildcards, crc does not, and never will
# optional, default is 'keywords'
dict = keywords
# memory locking for cached data (.spa and .spi), to prevent swapping
# optional, default is 0 (do not mlock)
# requires searchd to be run from root
mlock = 0
# a list of morphology preprocessors to apply
# optional, default is empty
#
# builtin preprocessors are 'none', 'stem_en', 'stem_ru', 'stem_enru',
# 'soundex', and 'metaphone'; additional preprocessors available from
# libstemmer are 'libstemmer_XXX', where XXX is algorithm code
# (see libstemmer_c/libstemmer/modules.txt)
#
# morphology = stem_en, stem_ru, soundex
# morphology = libstemmer_german
# morphology = libstemmer_sv
morphology = none
# minimum word length at which to enable stemming
# optional, default is 1 (stem everything)
#
# min_stemming_len = 1
# stopword files list (space separated)
# optional, default is empty
# contents are plain text, charset_table and stemming are both applied
#
# stopwords = /usr/local/sphinx/var/data/stopwords.txt
# wordforms file, in "mapfrom > mapto" plain text format
# optional, default is empty
#
# wordforms = /usr/local/sphinx/var/data/wordforms.txt
# tokenizing exceptions file
# optional, default is empty
#
# plain text, case sensitive, space insensitive in map-from part
# one "Map Several Words => ToASingleOne" entry per line
#
# exceptions = /usr/local/sphinx/var/data/exceptions.txt
# embedded file size limit
# optional, default is 16K
#
# exceptions, wordforms, and stopwords files smaller than this limit
# are stored in the index; otherwise, their paths and sizes are stored
#
# embedded_limit = 16K
# minimum indexed word length
# default is 1 (index everything)
min_word_len = 1
# ignored characters list
# optional, default value is empty
#
# ignore_chars = U+00AD
# minimum word prefix length to index
# optional, default is 0 (do not index prefixes)
#
# min_prefix_len = 0
# minimum word infix length to index
# optional, default is 0 (do not index infixes)
#
# min_infix_len = 0
# maximum substring (prefix or infix) length to index
# optional, default is 0 (do not limit substring length)
#
# max_substring_len = 8
# list of fields to limit prefix/infix indexing to
# optional, default value is empty (index all fields in prefix/infix mode)
#
# prefix_fields = filename
# infix_fields = url, domain
# expand keywords with exact forms and/or stars when searching fit indexes
# search-time only, does not affect indexing, can be 0 or 1
# optional, default is 0 (do not expand keywords)
#
# expand_keywords = 1
# n-gram length to index, for CJK indexing
# only supports 0 and 1 for now, other lengths to be implemented
# optional, default is 0 (disable n-grams)
#
ngram_len = 1
# n-gram characters list, for CJK indexing
# optional, default is empty
#
ngram_chars = U+3000..U+2FA1F
# phrase boundary characters list
# optional, default is empty
#
# phrase_boundary = ., "at", "t", and "at&t")
# optional, default is empty
#
# blend_chars = +, &, U+23
# blended token indexing mode
# a comma separated list of blended token indexing variants
# known variants are trim_none, trim_head, trim_tail, trim_both, skip_pure
# optional, default is trim_none
#
# blend_mode = trim_tail, skip_pure
# whether to strip HTML tags from incoming documents
# known values are 0 (do not strip) and 1 (do strip)
# optional, default is 0
html_strip = 0
# what HTML attributes to index if stripping HTML
# optional, default is empty (do not index anything)
#
# html_index_attrs = img=alt,title; a=title;
# what HTML elements contents to strip
# optional, default is empty (do not strip element contents)
#
# html_remove_elements = style, script
# whether to preopen index data files on startup
# optional, default is 0 (do not preopen), searchd-only
#
# preopen = 1
# whether to enable in-place inversion (2x less disk, 90-95% speed)
# optional, default is 0 (use separate temporary files), indexer-only
#
# inplace_enable = 1
# in-place fine-tuning options
# optional, defaults are listed below
#
# inplace_hit_gap = 0 # preallocated hitlist gap size
# inplace_docinfo_gap = 0 # preallocated docinfo gap size
# inplace_reloc_factor = 0.1 # relocation buffer size within arena
# inplace_write_factor = 0.1 # write buffer size within arena
# whether to index original keywords along with stemmed versions
# enables "=exactform" operator to work
# optional, default is 0
#
# index_exact_words = 1
# position increment on overshort (less that min_word_len) words
# optional, allowed values are 0 and 1, default is 1
#
# overshort_step = 1
# position increment on stopword
# optional, allowed values are 0 and 1, default is 1
#
# stopword_step = 1
# hitless words list
# positions for these keywords will not be stored in the index
# optional, allowed values are 'all', or a list file name
#
# hitless_words = all
# hitless_words = hitless.txt
# detect and index sentence and paragraph boundaries
# required for the SENTENCE and PARAGRAPH operators to work
# optional, allowed values are 0 and 1, default is 0
#
# index_sp = 1
# index zones, delimited by HTML/XML tags
# a comma separated list of tags and wildcards
# required for the ZONE operator to work
# optional, default is empty string (do not index zones)
#
# index_zones = title, h*, th
# index per-document and average per-index field lengths, in tokens
# required for the BM25A(), BM25F() in expression ranker
# optional, default is 0 (do not index field lenghts)
#
# index_field_lengths = 1
# regular expressions (regexps) to filter the fields and queries with
# gets applied to data source fields when indexing
# gets applied to search queries when searching
# multi-value, optional, default is empty list of regexps
#
# regexp_filter = \b(\d+)\" => \1inch
# regexp_filter = (blue|red) => color
# list of the words considered frequent with respect to bigram indexing
# optional, default is empty
#
# bigram_freq_words = the, a, i, you, my
# bigram indexing mode
# known values are none, all, first_freq, both_freq
# option, default is none (do not index bigrams)
#
# bigram_index = both_freq
# snippet document file name prefix
# preprended to file names when generating snippets using load_files option
# WARNING, this is a prefix (not a path), trailing slash matters!
# optional, default is empty
#
# snippets_file_prefix = /mnt/mydocs/server1
# whether to apply stopwords before or after stemming
# optional, default is 0 (apply stopwords after stemming)
#
# stopwords_unstemmed = 0
# path to a global (cluster-wide) keyword IDFs file
# optional, default is empty (use local IDFs)
#
# global_idf = /usr/local/sphinx/var/global.idf
}
# inherited index example
#
# all the parameters are copied from the parent index,
# and may then be overridden in this index definition
#index test1stemmed : test1
#{
# path = /usr/local/sphinx/var/data/test1stemmed
# morphology = stem_en
#}
# distributed index example
#
# this is a virtual index which can NOT be directly indexed,
# and only contains references to other local and/or remote indexes
#index dist1
#{
# 'distributed' index type MUST be specified
# type = distributed
# local index to be searched
# there can be many local indexes configured
# local = test1
# local = test1stemmed
# remote agent
# multiple remote agents may be specified
# syntax for TCP connections is 'hostname:port:index1,[index2[,...]]'
# syntax for local UNIX connections is '/path/to/socket:index1,[index2[,...]]'
# agent = localhost:9313:remote1
# agent = localhost:9314:remote2,remote3
# agent = /var/run/searchd.sock:remote4
# remote agent mirrors groups, aka mirrors, aka HA agents
# defines 2 or more interchangeable mirrors for a given index part
#
# agent = server3:9312 | server4:9312 :indexchunk2
# agent = server3:9312:chunk2server3 | server4:9312:chunk2server4
# agent = server3:chunk2server3 | server4:chunk2server4
# agent = server21|server22|server23:chunk2
# blackhole remote agent, for debugging/testing
# network errors and search results will be ignored
#
# agent_blackhole = testbox:9312:testindex1,testindex2
# persistenly connected remote agent
# reduces connect() pressure, requires that workers IS threads
#
# agent_persistent = testbox:9312:testindex1,testindex2
# remote agent connection timeout, milliseconds
# optional, default is 1000 ms, ie. 1 sec
# agent_connect_timeout = 1000
# remote agent query timeout, milliseconds
# optional, default is 3000 ms, ie. 3 sec
# agent_query_timeout = 3000
# HA mirror agent strategy
# optional, defaults to ??? (random mirror)
# know values are nodeads, noerrors, roundrobin, nodeadstm, noerrorstm
#
# ha_strategy = nodeads
# path to RLP context file
# optional, defaut is empty
#
# rlp_context = /usr/local/share/sphinx/rlp/rlp-context.xml
#}
# realtime index example
#
# you can run INSERT, REPLACE, and DELETE on this index on the fly
# using MySQL protocol (see 'listen' directive below)
#index rt
#{
# 'rt' index type must be specified to use RT index
# type = rt
# index files path and file name, without extension
# mandatory, path must be writable, extensions will be auto-appended
# path = /usr/local/sphinx/var/data/rt
# RAM chunk size limit
# RT index will keep at most this much data in RAM, then flush to disk
# optional, default is 128M
#
# rt_mem_limit = 512M
# full-text field declaration
# multi-value, mandatory
# rt_field = title
# rt_field = content
# unsigned integer attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# declares an unsigned 32-bit attribute
# rt_attr_uint = gid
# RT indexes currently support the following attribute types:
# uint, bigint, float, timestamp, string, mva, mva64, json
#
# rt_attr_bigint = guid
# rt_attr_float = gpa
# rt_attr_timestamp = ts_added
# rt_attr_string = author
# rt_attr_multi = tags
# rt_attr_multi_64 = tags64
# rt_attr_json = extra_data
#}
#############################################################################
## indexer settings
#############################################################################
indexer
{
# memory limit, in bytes, kiloytes (16384K) or megabytes (256M)
# optional, default is 128M, max is 2047M, recommended is 256M to 1024M
mem_limit = 128M
# maximum IO calls per second (for I/O throttling)
# optional, default is 0 (unlimited)
#
# max_iops = 40
# maximum IO call size, bytes (for I/O throttling)
# optional, default is 0 (unlimited)
#
# max_iOSize = 1048576
# maximum xmlpipe2 field length, bytes
# optional, default is 2M
#
# max_xmlpipe2_field = 4M
# write buffer size, bytes
# several (currently up to 4) buffers will be allocated
# write buffers are allocated in addition to mem_limit
# optional, default is 1M
#
# write_buffer = 1M
# maximum file field adaptive buffer size
# optional, default is 8M, minimum is 1M
#
# max_file_field_buffer = 32M
# how to handle IO errors in file fields
# known values are 'ignore_field', 'skip_document', and 'fail_index'
# optional, default is 'ignore_field'
#
# on_file_field_error = skip_document
# lemmatizer cache size
# improves the indexing time when the lemmatization is enabled
# optional, default is 256K
#
# lemmatizer_cache = 512M
}
#############################################################################
## searchd settings
#############################################################################
searchd
{
# [hostname:]port[:protocol], or /unix/socket/path to listen on
# known protocols are 'sphinx' (SphinxAPI) and 'mysql41' (SphinxQL)
#
# multi-value, multiple listen points are allowed
# optional, defaults are 9312:sphinx and 9306:mysql41, as below
#
# listen = 127.0.0.1
# listen = 192.168.0.1:9312
# listen = 9312
# listen = /var/run/searchd.sock
listen = 9312
listen = 9306:mysql41
# log file, searchd run info is logged here
# optional, default is 'searchd.log'
log = /usr/local/sphinx/var/log/searchd.log
# query log file, all search queries are logged here
# optional, default is empty (do not log queries)
query_log = /usr/local/sphinx/var/log/query.log
# client read timeout, seconds
# optional, default is 5
read_timeout = 5
# request timeout, seconds
# optional, default is 5 minutes
client_timeout = 300
# maximum amount of children to fork (concurrent searches to run)
# optional, default is 0 (unlimited)
max_children = 30
# maximum amount of persistent connections from this master to each agent host
# optional, but necessary if you use agent_persistent. It is reasonable to set the value
# as max_children, or less on the agent's hosts.
persistent_connections_limit = 30
# PID file, searchd process ID file name
# mandatory
pid_file = /usr/local/sphinx/var/log/searchd.pid
# seamless rotate, prevents rotate stalls if precaching huge datasets
# optional, default is 1
seamless_rotate = 1
# whether to forcibly preopen all indexes on startup
# optional, default is 1 (preopen everything)
preopen_indexes = 1
# whether to unlink .old index copies on succesful rotation.
# optional, default is 1 (do unlink)
unlink_old = 1
# attribute updates periodic flush timeout, seconds
# updates will be automatically dumped to disk this frequently
# optional, default is 0 (disable periodic flush)
#
# attr_flush_period = 900
# MVA updates pool size
# shared between all instances of searchd, disables attr flushes!
# optional, default size is 1M
mva_updates_pool = 1M
# max allowed network packet size
# limits both query packets from clients, and responses from agents
# optional, default size is 8M
max_packet_size = 8M
# max allowed per-query filter count
# optional, default is 256
max_filters = 256
# max allowed per-filter values count
# optional, default is 4096
max_filter_values = 4096
# socket listen queue length
# optional, default is 5
#
# listen_backlog = 5
# per-keyword read buffer size
# optional, default is 256K
#
# read_buffer = 256K
# unhinted read size (currently used when reading hits)
# optional, default is 32K
#
# read_unhinted = 32K
# max allowed per-BATch query count (aka multi-query count)
# optional, default is 32
max_batch_queries = 32
# max common subtree document cache size, per-query
# optional, default is 0 (disable subtree optimization)
#
# subtree_docs_cache = 4M
# max common subtree hit cache size, per-query
# optional, default is 0 (disable subtree optimization)
#
# subtree_hits_cache = 8M
# multi-processing mode (MPM)
# known values are none, fork, prefork, and threads
# threads is required for RT backend to work
# optional, default is threads
workers = threads # for RT to work
# max threads to create for searching local parts of a distributed index
# optional, default is 0, which means disable multi-threaded searching
# should work with all MPMs (ie. does NOT require workers=threads)
#
# dist_threads = 4
# binlog files path; use empty string to disable binlog
# optional, default is build-time configured data directory
#
# binlog_path = # disable logging
# binlog_path = /usr/local/sphinx/var/data # binlog.001 etc will be created there
# binlog flush/sync mode
# 0 means flush and sync every second
# 1 means flush and sync every transaction
# 2 means flush every transaction, sync every second
# optional, default is 2
#
# binlog_flush = 2
# binlog per-file size limit
# optional, default is 128M, 0 means no limit
#
# binlog_max_log_size = 256M
# per-thread stack size, only affects workers=threads mode
# optional, default is 64K
#
# thread_stack = 128K
# per-keyword expansion limit (for dict=keywords prefix searches)
# optional, default is 0 (no limit)
#
# expansion_limit = 1000
# RT RAM chunks flush period
# optional, default is 0 (no periodic flush)
#
# rt_flush_period = 900
# query log file format
# optional, known values are plain and sphinxql, default is plain
#
# query_log_format = sphinxql
# version string returned to MySQL network protocol clients
# optional, default is empty (use Sphinx version)
#
# mysql_version_string = 5.0.37
# default server-wide collation
# optional, default is libc_ci
#
# collation_server = utf8_general_ci
# server-wide locale for libc based collations
# optional, default is C
#
# collation_libc_locale = ru_RU.UTF-8
# threaded server watchdog (only used in workers=threads mode)
# optional, values are 0 and 1, default is 1 (watchdog on)
#
# watchdog = 1
# costs for max_predicted_time model, in (imaginary) nanoseconds
# optional, default is "doc=64, hit=48, skip=2048, match=64"
#
# predicted_time_costs = doc=64, hit=48, skip=2048, match=64
# current SphinxQL state (uservars etc) serialization path
# optional, default is none (do not serialize SphinxQL state)
#
# sphinxql_state = sphinxvars.sql
# maximum RT merge thread IO calls per second, and per-call IO size
# useful for throttling (the background) OPTIMIZE INDEX impact
# optional, default is 0 (unlimited)
#
# rt_merge_iops = 40
# rt_merge_maxiosize = 1M
# Interval between agent mirror pings, in milliseconds
# 0 means disable pings
# optional, default is 1000
#
# ha_ping_interval = 0
# agent mirror statistics window size, in seconds
# stats older than the window size (karma) are retired
# that is, they will not affect master choice of agents in any way
# optional, default is 60 seconds
#
# ha_period_karma = 60
# delay between preforked children restarts on rotation, in milliseconds
# optional, default is 0 (no delay)
#
# prefork_rotation_throttle = 100
# a prefix to prepend to the local file names when creating snippets
# with load_files and/or load_files_scatter options
# optional, default is empty
#
# snippets_file_prefix = /mnt/common/server1/
}
#############################################################################
## common settings
#############################################################################
#common
#{
# lemmatizer dictionaries base path
# optional, defaut is /usr/local/share (see ./configure --datadir)
#
# lemmatizer_base = /usr/local/share/sphinx/dicts
# how to handle syntax errors in JSON attributes
# known values are 'ignore_attr' and 'fail_index'
# optional, default is 'ignore_attr'
#
# on_json_attr_error = fail_index
# whether to auto-convert numeric values from strings in JSON attributes
# with auto-conversion, string value with actually numeric data
# (as in {"key":"12345"}) gets stored as a number, rather than string
# optional, allowed values are 0 and 1, default is 0 (do not convert)
#
# json_autoconv_numbers = 1
# whether and how to auto-convert key names in JSON attributes
# known value is 'lowercase'
# optional, default is unspecified (do nothing)
#
# json_autoconv_keynames = lowercase
# path to RLP root directory
# optional, defaut is /usr/local/share (see ./configure --datadir)
#
# rlp_root = /usr/local/share/sphinx/rlp
# path to RLP environment file
# optional, defaut is /usr/local/share/rlp-environment.xml (see ./configure --datadir)
#
# rlp_environment = /usr/local/share/sphinx/rlp/rlp/etc/rlp-environment.xml
# maximum total size of documents batched before processing them by the RLP
# optional, default is 51200
#
# rlp_max_batch_size = 100k
# maximum number of documents batched before processing them by the RLP
# optional, default is 50
#
# rlp_max_batch_docs = 100
# trusted plugin directory
# optional, default is empty (disable UDFs)
#
# plugin_dir = /usr/local/sphinx/lib
#}
# --eof--
将注释去掉的:
source src1
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass = 2018
sql_db = tpblog
sql_sock = /tmp/mysql.sock
sql_query_pre = SET NAMES utf8
sql_query_pre = SET SESSION query_cache_type=OFF
sql_query = select id,des,des as attr_des,keywords,keywords as attr_keywords,content,content as attr_content from tpblog_arcdata #查询的id是表的主键,一定要加上
sql_attr_uint = article_aid #不能是主键(其他表的也不行)
sql_attr_string = attr_des
sql_attr_string = attr_keywords
sql_attr_string = attr_content
sql_ranged_throttle = 0
}
index test1
{
source = src1
path = /usr/local/sphinx/var/data/test1
docinfo = extern
dict = keywords
mlock = 0
morphology = none
min_word_len = 1
ngram_len = 1
ngram_chars = U+3000..U+2FA1F
html_strip = 1 # 1 表示如果表中的数据是通过编辑器上传的,表中会有标签,如果搜索标签会将数据全部搜索出来,改成 1 就搜索不出来了,默认是0
}
indexer
{
mem_limit = 128M
}
searchd
{
listen = 9312
listen = 9306:mysql41
log = /usr/local/sphinx/var/log/searchd.log
query_log = /usr/local/sphinx/var/log/query.log
read_timeout = 5
client_timeout = 300
max_children = 30
persistent_connections_limit = 30
pid_file = /usr/local/sphinx/var/log/searchd.pid
seamless_rotate = 1
preopen_indexes = 1
unlink_old = 1
mva_updates_pool = 1M
max_packet_size = 8M
max_filters = 256
max_filter_values = 4096
max_batch_queries = 32
}
建立索引:
cd /usr/local/sphinx/bin
执行:
./indexer --all
成功
开启端口:
开启9306和9316端口
将/home/wwwroot/default/www/sphinx-2.2.11-release/api中的sphinxapi.php和test.php复制到项目的根目录下
在项目中运用
public function index() {
$name=$_GET['name'];
require ( "/home/wwwroot/default/www/sphinx-2.2.11-release/api/sphinxapi.php" );
$cl = new \SphinxClient ();
$q = $name; //要搜索的传过来的关键词
$host = "localhost";
$port = 9312;
$index = 'test1';//索引
$limit = 20;//限制输出的条数
$cl->SetServer ( $host, $port );
$cl->SetConnectTimeout ( 1 );
$cl->SetArrayResult ( true );
if ( $limit ) $cl->SetLimits ( 0, $limit, ( $limit>1000 ) "has">
Array
(
[0] => Array
(
[id] => 1
[weight] => 2356
[attrs] => Array
(
[attr_des] => 我生在艰苦的年代,童年时,家里并不富裕,有时吃了上顿没了下顿,哪还有钱买生日蛋糕啊,更不要说其它的生日礼物了,能吃上一碗热腾腾的面条就很不错了。记得有一年我生日,妈妈为我能吃上一
[attr_keywords] => 老虎
[attr_content] =>
我有一只布老虎,是我很小的时候,妈妈送给我的生日礼物。
我生在艰苦的年代,童年时,家里并不富裕,有时吃了上顿没了下顿,哪还有钱买生日蛋糕啊,更不要说其它的生日礼物了,能吃上一碗热腾腾的面条就很不错了。记得有一年我生日,妈妈为我能吃上一顿面条,把家里仅有的两碗小麦,用石磨磨成面粉,然后留下一碗面粉,其余的面粉擀成面条。
吃饭时,我看见只有我的碗里全是面条,爸爸妈妈和两个弟弟碗里大多是山芋,我赶紧把面条分点给爸爸妈妈和两个弟弟,爸爸妈妈说什么也不要,只有两个弟弟叫喳喳地:“我要我要!”就这样我又快乐地过了一个生日。
随着我们渐渐长大,日子也好了很多。一家人过生日的时候,都能饱饱地吃上一顿面条。秋去冬来,我快十岁了,妈妈为赶在我的生日前,送我一只布老虎,坐在火炉旁一针一线,不知熬了多少个夜晚,才把一只栩栩如生的布老虎做好。这个生日最值得我怀念了,不但有面条吃,还有一个特别可爱的玩具,我高兴极了,每天睡觉都抱在怀里。
据说布老虎是一种古代就流传的工艺品,它品种繁多,是驱邪避灾、平安吉祥的象征,而且能保护财富。这或许是妈妈用心良苦的原因吧。
)
)
[1] => Array
(
[id] => 7
[weight] => 2353
[attrs] => Array
(
[attr_des] => 我没有想到,在面临全族灭绝的关健时刻,班羚们竟然能想出牺牲一半挽救一半的办法来赢得家族的生存机会。我更没想到,老班羚们会那么从容地面对死亡,即使自己被摔得粉身碎骨,也心甘情愿
[attr_keywords] => 班羚 跳跃
[attr_content] =>
我曾见过一场异常悲壮的死亡,正是那次死亡深深震撼了我,我从此发誓不再伤害哪怕再微小的生命……
那是在一次围猎班羚的过程中。班羚又名青羊,形似家养山羊,擅长跳跃,每头成年班羚重约30多公斤,性情温驯,是猎人最喜欢的动物。
那次,我们狩猎队严密堵截,把一群60多只羚羊逼到布朗山的断命岩上,想把它们逼下岩去摔死,以免浪费子弹。
)
)
[2] => Array
(
[id] => 5
[weight] => 1401
[attrs] => Array
(
[attr_des] => 克拉伦斯非常难过,因为他的计划完全失败了。同时,他也非常生气。因为当菲利克斯像一袋水泥一样落下时,完全忘记了控制下落的速度,也完全忘记了“落下要更灵巧,而不是更重”的忠告。
[attr_keywords] => 青蛙
[attr_content] =>
总是在夜阑人静时,伴着徐徐柔风,鼓腮而鸣。遍布乡村的空灵之音,仿佛要给馨香若醉的四野,传送一缕辛勤耕耘的清凉。站在路边,伫于村口,抑或走近待熟的庄稼,到处都可以领略到跳跃的韵律。一不小心,还会吓你一个激灵。不过吓过之后,它定会在不远处,瞪着圆圆鼓鼓的眼睛,观察你的反应。你动,他动。你不动,它不动。好像要把你的心思,引向随风飘游的稻香。
夏末初秋,少了些许骄阳似火的炎热,这时节,青蛙便感觉最惬意。因为它们呱呱的叫声,不会惹得人们心烦,反而会给日夜操劳的乡亲,带来即将收割的喜悦。的确是这样,每当忙碌了一天的人们,在欢声笑语中吃完一顿并不丰盛但却甜蜜的晚餐,便会打着饱饱的嗝,倒背着长满老茧的大手,慢悠悠地行走在田间小道上,哼着虽然跑了调但自己却依然满意的山歌,一边尽情地吮吸着瓜果的味道,一边仔细地聆听着虫吟的美妙。那情形,如泼墨大师描绘的一幅田园山水。
记得小时候,春暖花开,芳菲遍地,门前的小河哗哗地流淌着乡村渐红渐暖的热情。逆流而上的小鱼,摇着欢快的尾巴,偶尔把头探出水面,吹一个调皮的气泡,宛若要给萌动的季节,缀上一个传神的标点。
为什么不是呢?烦躁的季节,听一曲蛙鸣,感受美妙的乐音,也是一种久违的禅境!
)
)
[3] => Array
(
[id] => 3
[weight] => 1387
[attrs] => Array
(
[attr_des] => 公园的某个角落有一群猴子,铁栅栏外面是前来游玩的人。人看着猴子;猴子看着人。虽然都在看,想法定不同。人的思维能力举世无双自然不消说的,一看到某种东西对自己的心思,就美其名曰:欣赏。
[attr_keywords] => 猴子
[attr_content] =>
公园的某个角落有一群猴子,铁栅栏外面是前来游玩的人。人看着猴子;猴子看着人。虽然都在看,想法定不同。人的思维能力举世无双自然不消说的,一看到某种东西对自己的心思,就美其名曰:欣赏。由此而论抑或是猜想,人们一定以好奇的心理观赏猴子,身心都很娱乐。那么猴子呢,就简单的多了——它们可能什么都没想,即便想也一定是自己的事情,不会对某个人特别感兴趣。因为在猴子的眼里,人的模样都差不多,就像咱看非洲黑人。有一点也许可以肯定:因为受到了某种来自人类的不公平待遇,他们私底下会认为:人,真不是东西!
猴子的日子比人过得好。它们每天山上采食野果充饥,之后是公猴母猴谈恋爱,大猴子带着小猴子玩,之后就是攀援树木山岩,尽情嬉戏玩耍。摘野果比武艺。比较而言,人就活得累多了。到底如何累法用不着多啰嗦,大家心中有数。假如我们随便说一个人活得不如猴子,那被说的人一定十分的反感,以为是污蔑。人啊,大都以为自己的生存状态优越于猴子,其实往深里想想,就知道我们人类实在是活得太累太累,那么多的不如意不自由,那么多的束缚欠洒脱。想想就叫人不寒而栗。上学被书包压得喘不过气来;毕业找不到合适的工作;谈对象高不成低不就;工作了职务太低;上有老下有小顾头不顾腚;到老了子女又不孝顺,诸如此类的不如意你一点都不占,除非你是不是人间烟火的神仙。比比猴子,那么到底谁的生活状态更值得肯定呢?任何的比较都须有标准,没有标准的比较毫无意义。对于生命而言,唯一的标准就是活得幸福安康快乐无忧。这样一比就不难判断出到底谁的生活质量高了。从进化论角度看,人和猴子应是同一祖先,然而人和猴子永远是并列关系而不是递进关系。猴子要想变成人不可能,人也不会变成猴子,即使能够也没人去变,人们大都以为自己比猴子高级。事实到底如何?也用不着我来啰唆。
我们人类曾经多么的自豪!将来将要多么的自豪!人类的进步常常是大自然的灾难;将来也很可能就是自己的甚至殃及后代的灾难。人类实在是进化的太快了。快得使我们忘记了森林曾是我们的故乡,忘记了蓝天碧水的无以伦比的重要性,我们住在钢筋水泥隔就的空间里玩弄高科技,其目的就是不断的与大自然拉开距离,让更多的不起眼的小生命失去生存的可能。不要邻居,拒绝朋友,人类的自作聪明已非一日,回归自然的思考一直停留在办公桌的表面,桌子头上的舞蹈再跳一百年还是老样子。大自然的忍耐力毕竟有限,还是小心为妙!小心无大错!人,不能毫无顾忌。据说当年孟子的母亲面对外出的儿子说过这样一句话:小心,天下去得。对于亚圣母亲的一句名言,我们大多数人的的确确是忽略了。那么猴子呢,它们的进步远不会与人类同日而语,也许永远是落后于我们的,然而它们无愧于这个星球,因为他们能够与所有的他生物和平相处。在大力提倡和谐的当下,他们的生存状态倒是值得称道的。我们人类永远也不要自以为是,永远也不要自作聪明。如果我们单单从世界大和谐角度来看,人类一点不比猴子先进!我平生爱看也闹,任何带有娱乐性质的活动都愿意参加,可是有一种活动我很反感,那就是街头上的耍猴子。一大群人围着几只猴子观看,嘻嘻呵呵。那耍猴子的人手提一面锣,当当当敲着,另一只手扯着一根或几根绳子,那绳子皆是拴在猴子们的脖子上的,伴随着锣声,猴子们被逼着按照主人的指令完成各类动作,围观人从猴子们的举动中获取乐趣。猴子举动稍有不对就会惨遭鞭笞,人们的开心就建造在猴子的痛苦之上,有趣吗?古人云,勿以恶小而为之,勿以善小而不为,己所不欲勿施于人,人们的良心虽然看不见摸不着,却是实实在在存在的,并且须臾不可偏废!我们人类一直就是以牺牲他类的幸福甚至生存换取自己的舒适的,这从善良的角度看是很不人道的。
我们有时候会用怅惘的心境回望农耕时代的安宁与纯净;有时候免不了就想:不懂现代高科技的猴子的生活技巧比我们更接近生命的本质,是不是?
)
)
[4] => Array
(
[id] => 8
[weight] => 1387
[attrs] => Array
(
[attr_des] => 说实话,狮子林并不在我记忆的范畴。然而却因为导游的讲解,使我感到此园的和蔼、亲切。在脑中浮现的,只有我的现任班主任而已。您别奇怪我为和会想到班主任,其实原因颇多。
[attr_keywords] => 狮子
[attr_content] =>
苏州的狮子林,作为元代的古建筑,以及享有的“假山王国”的美誉,吸引着慕名而来的古今游人。我便是那茫茫人海中的一朵小小浪花。
据说当年乾隆老爷子也来此游玩。游兴未尽的万岁,还下令在北京的圆明园、承德的避暑山庄内仿建了两座狮子林。可见当年乾隆皇帝对狮子林的情有独钟。
)
)
[5] => Array
(
[id] => 6
[weight] => 1356
[attrs] => Array
(
[attr_des] => 虎毒不食子,母爱众生同!
可是,与动物相比,我们的人性又有多少高尚之处呢?有时,即使不存恶意地进入动物的领域,都给动物造成伤害,更别说蓄意屠杀了。不久前,我听到印度的一位同行讲的一则有关犀牛的故事
[attr_keywords] => 猎物虎
[attr_content] =>
我们常说,“可怜天下父母心”,那大概是专指人类而言,但同在蓝天下的芸芸众生,包括各种鸟兽虫鱼,哪一个没有父母心、赤子情呢?可是我们人类居然视而不见,杀死动物的父母,喂养人类自己;侵占动物的生存空间,满足自己的私欲,公道何在?良心何在?最近,我参加中央电视台《视觉》栏目的一个节目,在演播室就此话题淋漓尽致地发了一番感慨,直说得善于言辞的主持人心事沉沉、无语凝噎。为什么?大概是这一桩桩、一件件有关鸟兽亲情的可怜且可悲的故事,深深触动了她的心。
一位猎人在追杀一只藏羚羊时,将羚羊逼向悬崖,使其走投无路。突然,这只藏羚羊不再奔跑,回头面对猎人跪下了。“奇怪,动物还会求生?”猎人思忖着,但他并未因之而动恻隐之心,依然举枪将近在咫尺的藏羚羊打死了。
拖着猎物回到住地,猎人解剖时发现,这只羚羊的腹中竟有一个胎儿。猎人怔住了:“这是一个即将生产的母亲!难怪她跪下求饶,原来是为了保全孩子的性命!”猎人的铁石心肠被感动了,“我干什么呀?真是禽兽不如!”终于,猎人丢掉猎枪,金盆洗手。
)
)
)
建立索引
/home/www/sphinx-3.1.1/bin/indexer -c /home/www/sphinx-3.1.1/bin/sphinx.conf --all
启动sphinx
/home/www/sphinx-3.1.1/bin/searchd -c /home/www/sphinx-3.1.1/bin/sphinx.conf
停止sphinx
/home/www/sphinx-3.1.1/bin/searchd -c /home/www/sphinx-3.1.1/bin/sphinx.conf --stop
视频地址:https://ke.qq.com/webcourse/index.html#cid=309238&term_id=100366592&taid=2108644259051510&vid=p1422t34kwo
相关阅读
What/Sphinx是什么 定义:Sphinx是一个全文检索引擎。 特性: 索引和性能优异 易于集成SQL和XML数据源,并可使用SphinxAPI、SphinxQ