Solr¶
Zookeeper is very similar to hadoop in spirit and Solr in very similar to Nutch in spirit
Note: Apache solr comes with a default zookeeper, never use it. Always, use an external zookeeper by following the below mentioned steps.
Setting a zookeeper ensemble (On different machines) - Refer * here * Assume that you have 3 servers: 10.2.4.120, 120.2.4.122, 10.2.4.126.
Modify the
/etc/hostsfile in all the 3 systems and include the lines10.2.4.120 solr110.2.4.122 solr210.2.4.126 solr3
Install zookeeper in
/opt/zookeeper/on all 3 machinesCreate a folder called data in /opt/zookeeper/ on all 3 machines
In the folder
/opt/zookeeper/datacreate a file called myid on all 3 machines. The content of myid on machine 1 (solr1) should be 1. The content of myid on machine 2 (solr2) should be 2. The content of myid on machine 3 (solr3) should be 3.You will have a
zoo.cfgin/opt/zookeeper/conf/zoo.cfg. For the first server, called solr1, It will look like the following:dataDir=/opt/zookeeper/data clientPort=2181 initLimit=5 syncLimit=2 server.1=0.0.0.0:2888:3888 server.2=solr2:2888:3888 server.3=solr3:2888:3888
For other servers, it will be similar. The only difference would be: for server ‘x’, the
server.x=0.0.0.0:2888:3888Go to
/opt/zookeeperfolder and execute the following command on all 3 machines.sudo ./bin/zkServer.sh start[OPTIONAL CONF FILE: Default is/opt/zookeeper/conf/zoo.cfg]. This should start the zookeeper on all the 3 machines.You should check if the zookeeper has started properly on all the 3 machines by issuing the following commands:
sudo ./bin/zkServer.sh statusIf your zookeeper was running properly, You should see some output like:
JMX enabled by default Using config: /opt/zookeeper/bin/../conf/zoo.cfg Mode: follower
The last line might also be
Mode: leader. If there is an error then the last line would be something like “may be zookeeper is not even running”
You can stop the zookeeper by issuing the following command on all the 3 systems: :code: sudo ./bin/zkServer.sh stop
./bin/zkClient.sh- This will take you to zookeeper mode where you have a set of limited choices available. Type help it will show what all choices you have.Note 1: It does not mean that, if you have 3 machine, you have to use exactly 3 zookeeper instances, you can have more or less.
Setting up Solr cloud (On different machines) - Help is here
- Assume that you have the same 3 servers: 10.2.4.120, 120.2.4.122, 10.2.4.126 where zookeeper was also running
- Assume that, your
/etc/hostsis already modified as described in step 2 of zookeeper setup (above) - Install solr in
/opt/solr/on all 3 machines. - If you had already run solr before and created a collection (e.g. wikipedia) and core (like wikipedia_shard1_replica1 or wikipedia_shard2_replica1) and you want to run search on those cores, issue the following command on any system or all system [depends on individual case under consideration]
bin/solr start -c -h solr3 -p 8983 -s collections_Home/solr/ -z solr1:2181,solr2:2181,solr3:2181 -m 2g-coption denotes cloud mode-hoption denotes hostname of machine on which you are running this [e.g. solr1, localhost, etc]-p portwhere you want to run this solr instance-s solr_home_folderor place where your collections/cores are located. Typically, this folder has one or more cores (which are also folders) like wikipedia_shard1_replica1 or wikipedia_shard2_replica1. It must contain a solr.xml file!! It might also contain a conf dir-zthis is the zookeeper ensemble which we have set up previously.-mJava Memory that solr might take- Note Collection and cores are one and the same, depending on whether we are running solr on cloud mode or standalone mode. We have Collections (in SolrCloud mode) or cores (in standalone mode)
- Creating solr collections or cores
bin/solr -c collection_name -d conf_dir -n config_name -p port -s num_of_shards -rf num_of_replicas-c: Default is gettingstarted. The default final collection folders are stored in folders likeexample/cloud/node1/solr-d: Default it data_driven_schema_configs-n: Configuration name. Default is same name as core or collection.-p: Port. By default the script tries to detect the port by looking for running Solr instances.-s: Number of shards. Default is 1.-rf: Number of replicas. Default is 1.
If you don’t provide the
-dor-noptions, then the default * configuration$SOLR_HOME/server/solr/configsets/data_driven_schema_configs/confis uploaded to ZooKeeper using the same name as the collection (under/configs/collection_namein Zookeeper).Note: The
data_driven_schema_configsschema can mutate as data is indexed. Consequently, we recommend that you do not share data-driven configurations between collections unless you are certain that all collections should inherit the changes made when indexing data into one of the collections.- Deleting solr collections or cores
bin/solr delete -c core_or_collection_name -deleteConfig true|false -p port-delteConfig: Deletes the corresponding directory from ZooKeeper. Default is true. If the configuration directory is being used by another collection, then it will not be deleted even if you pass-deleteConfig true.- Note: Several of the things mentioned here [and much much more] can also be done using API calls. Refer here
- Useful scripts that integrate solr with zookeeper reference
We have a very useful script shipped with solr which is present in the directory
/opt/solr/server/scripts/cloud-scripts/zkcli.sh. It can be used to upload a new configuration onto the zookeeper (which would eventually go under/configs/dir_namein zookeeper)Note: This script is also present in the
zookeeper_home/bindirectory. Use this one in case you are using a zookeeper ensemble../server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -confdir config_dir_to_upload -confname configuration_name_you_wish_to_give -z solr1:2181,solr2:2181,solr3:2181It can also be used to link a collection [present in zookeeper] with a configuration [also present in a zookeeper]
./server/scripts/cloud-scripts/zkcli.sh -cmd linkconfig -collection collection_name -confname configuration_name_in_zookeeper -z solr1:2181,solr2:2181,solr3:2181Status update in solr *
bin/solr status*bin/solr healthcheck -c collection_name -z zookeeper_host* Usebin/solr stop -p [port_number or all]To delete all the documents index in the core issue the following command in the browser: * http://host_name:port_name/solr/core_name_with_shardId_and_replica_id/update?stream.body=<delete><query>*:*</query></delete>&commit=true
- E.g: http://localhost:8984/solr/twitter_unprocessed_shard1_replica1/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E&commit=true
- There are other ways of deleting documents as well (currently, this seems to be the simplest). Note that, this deletes the indexed documents in a core and NOT the core itself.