linux - Solr server memory and disk space -


context:

i have aws ec2 instance

  • 8gb ram
  • 8gb of disk space

it runs solr 5.1.0 with

  • java heap of 2048mb
  • -xms2048m -xmx2048m

extra: (updated)

  • logs generated on server
  • imports happen in intervals of 10s (always delta)
  • importing db (jdbcdatasource)
  • i don't think have optimization strategy configured right now
  • gc profiling? don't know.
  • how can find out how large fields .. , large?

situation:

the index on solr has 200.000 documents , queried not more once per second. however, in 10 days, memory and disk space of server reaches 90% - 95% of available space.

when investigating disk usage sudo du -sh / returns total of 2.3g. not as df -k tells me (use% -> 92%).

i can, sort of, resolve situation restarting solr service.

what missing? how come solr consumes memory , disk space , how prevent it?

extra info @tmbt

sorry delay, i’ve been monitoring solr production server last few days. can see roundup here: https://www.dropbox.com/s/x5diyanwszrpbav/screencapture-app-datadoghq-com-dash-162482-1468997479755.jpg?dl=0 current state of solr: https://www.dropbox.com/s/q16dc5t5ctl32od/screenshot%202016-07-21%2010.29.13.png?dl=0 restarted solr @ beginning of monitoring , now, 2 days later see disk space goes down @ rate of 1,5gb per day. if need more specifics, let me know.

  • there not many deleted docs per day. we’re talking 50 - 250 per day max.
  • the current logs directory of solr: ls -lh /var/solr/logs -> total 72m
  • there no master-slave setup
  • the importer runs ever 10 seconds, imports no more 10 - 20 docs each time. large import of 3k-4k docs happens each night. there not action going on in solr @ time.
  • there no large fields, largest field can contain 255 chars.

with monitoring in place tested common queries. contain faceting (field, queries), sorting, grouping, … doesn’t affect various metrics of heap , gc count.

first, visit your.solr.instance:[port]/[corename]/admin/system , check see how many resources solr using. memory , system elements useful you. may else on box culprit @ least of resource usage.

to me, can "sort of" resolve problem restarting solr screams "query , import obnoxiousness" memory. disk space, wouldn't surprised if it's log files behind that. wonder if you're ending lot of old, deleted files due numerous delta imports lying around until solr automatically deletes them. in fact, if go http://your.solr.instance:[port]/solr/#/[corename], should able see how many deleted docs in index. if there's very, large number, should schedule time during low usage run optimize rid of them.

also aware solr seems have tendency of filling of given heap space can.

since logs generated on server, check see how many of them exist. solr after 4.10 has nasty habit of generating large numbers of log files, can cause disk space issues, how import. information on how deal solr's love of logging, i'm going refer self-answer @ solr 5.1: solr creating way many log files. you'll want navigate solr startup script disable solr's log backups , replace solution of own.

if have master-slave setup, check see if slave backing configuration files, schema.xml or solrconfig.xml.

depending on how many records imported per delta, have commits overlapping each other, affect resource usage on box. if in logs read overlapping ondecksearchers issue you.

lots of delta imports means lots of commits. commit heavy operation. you'll want tweak solrconfig.xml soft commit after number of documents , hard commit after little bit more. if perform commits in batches, frequent deltas should have less of impact.

if joining columns imports, may need index joined columns in database. if database not on same machine solr, network latency possible problem. it's 1 i've struggled in past. if db on same machine , need index, not indexing have negative effect on box's resources.

it may helpful use visualvm on solr view heap usage , gc. want make sure there's not rapid increase in usage , want make sure gc isn't having bunch of stop-the-world collections can cause weirdness on box.

optimize intensive operation shouldn't need use often, if @ all, after 4.10. people still do, though, , if have tons of deleted documents might useful you. if ever decide employ optimization strategy, should done during times of low usage, optimize temporarily doubles size of index. optimize merges segments , removes files marked deletion deltas.

by "large fields", mean fields large amounts of data in them. need size limits each field type you're using, if you're running towards max size field, may want try find way reduce size of data. or can omit importing large columns solr , instead retrieve data columns in source db after getting particular document(s) solr. depends on set , need. may or may not able it. if else running more efficiently should fine.

the type of queries run can cause problems. lots of sorting, faceting, etc can memory-intensive. if you, hook visualvm solr watch heap usage , gc, , load test solr using typical queries.


Comments

Popular posts from this blog

Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project.Error occurred in starting fork -

windows - Debug iNetMgr.exe unhandle exception System.Management.Automation.CmdletInvocationException -

configurationsection - activeMq-5.13.3 setup configurations for wildfly 10.0.0 -