Jump to content

calling hadoop admins or developers


Recommended Posts

Posted

naku Tez configuration koncham suggest cheyandi .. I will list my cluster config and current config what I have here .. I feel I'm under utilizing my cluster.

Hardware vachesi - 1Master node - 64 VCore, 244Gig Ram, 640 GB SSD and 4 data nodes with same config as master

below is the hive/tez config

hive-site    tez.am.resource.memory.mb    59205
hive-site    hive.tez.java.opts    -Xmx47364m
hive-site    hive.execution.engine    tez
hive-site    hive.vectorized.execution.enabled    true
hive-site    tez.am.grouping.max-size    36700160000
hive-site    hive.tez.container.size    59205
hive-site    hive.vectorized.execution.reduce.enabled    true
hive-site    tez.task.resource.memory.mb    10000
hive-site    tez.am.launch.cmd-opts    -Xmx47364m

 

with current setting I can see only 15 mappers or reducers are running at any given point.. container size thagginchi ekuva containers ni parallel ga run chesela chesthe queries runs faster ani chadivanu and I want to try that .. 

https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html

I have changed my Tez settings according to this doc but containers getting killed with some physical memory issue ..

 

evaranna help cheyandi thanks 

Posted
3 minutes ago, mettastar said:

naku Tez configuration koncham suggest cheyandi .. I will list my cluster config and current config what I have here .. I feel I'm under utilizing my cluster.

Hardware vachesi - 1Master node - 64 VCore, 244Gig Ram, 640 GB SSD and 4 data nodes with same config as master

below is the hive/tez config

hive-site    tez.am.resource.memory.mb    59205
hive-site    hive.tez.java.opts    -Xmx47364m
hive-site    hive.execution.engine    tez
hive-site    hive.vectorized.execution.enabled    true
hive-site    tez.am.grouping.max-size    36700160000
hive-site    hive.tez.container.size    59205
hive-site    hive.vectorized.execution.reduce.enabled    true
hive-site    tez.task.resource.memory.mb    10000
hive-site    tez.am.launch.cmd-opts    -Xmx47364m

 

with current setting I can see only 15 mappers or reducers are running at any given point.. container size thagginchi ekuva containers ni parallel ga run chesela chesthe queries runs faster ani chadivanu and I want to try that .. 

https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html

I have changed my Tez settings according to this doc but containers getting killed with some physical memory issue ..

 

evaranna help cheyandi thanks 

i work on spark

will see if i can help you 

  • tez.grouping.max-size(default 1073741824 which is 1GB)
  • tez.grouping.min-size(default 52428800 which is 50MB)
  • tez.grouping.split-count(not set by default)

http://www.openkb.info/2017/05/hive-on-tez-how-to-control-number-of.html

 

and No of Mappers n reducers depends on the container size.....

 

and optimizing you job depends on size of the data you have,

your data might me small, but if you are shuffling data too much, then there will be a lot of overhead

like if you have an error msg like 9.2gb out of 9gb....make make sure you have enough overhead...not sure how it translates to Tez....try googling it 

 

Posted
3 minutes ago, kasi said:

i work on spark

will see if i can help you 

  • tez.grouping.max-size(default 1073741824 which is 1GB)
  • tez.grouping.min-size(default 52428800 which is 50MB)
  • tez.grouping.split-count(not set by default)

http://www.openkb.info/2017/05/hive-on-tez-how-to-control-number-of.html

 

and No of Mappers n reducers depends on the container size.....

 

and optimizing you job depends on size of the data you have,

your data might me small, but if you are shuffling data too much, then there will be a lot of overhead

like if you have an error msg like 9.2gb out of 9gb....make make sure you have enough overhead...not sure how it translates to Tez....try googling it 

 

similar error vasthundi vuncle.. data volume maree ekkuva kaadhu like 10-15GB .. config ni chusthe adi easy ga process cheyali ani na feeling. 

enough overhead ante endhi vuncle ? heapsize ?

Posted

and do you know of any active hive/hadoop forums where I can ask questions like this ?

Posted
1 hour ago, mettastar said:

similar error vasthundi vuncle.. data volume maree ekkuva kaadhu like 10-15GB .. config ni chusthe adi easy ga process cheyali ani na feeling. 

enough overhead ante endhi vuncle ? heapsize ?

yes heap....gc 

thats what i though......check for skewed data 

if you are shuffling your data, after the shuffle your data is getting skewed

 

error lo main part ni google chey......

Posted

set hive.execution.engine=tez;
set tez.am.resource.memory.mb=59205;
set tez.am.launch.cmd-opts=-Xmx47364m;
set hive.tez.container.size=5120;
set hive.tez.java.opts=-Xmx4096m;
set tez.task.resource.memory.mb=10000;
set hive.auto.convert.join.noconditionaltask.size=1000000000;
set tez.am.grouping.max-size=36700160000;
--set tez.grouping.max-size=1073741824;
--set tez.grouping.min-size=52428800;
set tez.runtime.io.sort.mb=2048;
set hive.vectorized.execution.enabled=true;
set hive.vectorized.execution.reduce.enabled=true;
 

used these settings and was able to reduce the run time by 1/3rd ^^ 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...