`
文章列表
@爱摩王涛:数据的力量,未来商业的制高点 ,基础是云计算。//@数据化管理:「从商业智能到消费智能」在商业智能时代企业收集各类数据支持自己的决策。而在消费智能时代,数据分析业务将作为一项服务由企业提供给消费者,支持他们自己的消费决策。银行帐单分析就是这种思路。B2C网站也可以提供消费者个体的购买行为分析给消费者,让他们自己决策。http://t.cn/zOga2xj 从企业向个人用户转换的决策支撑-大数据分析平台
http://www.cloudera.com/blog/2012/09/what-do-real-life-hadoop-workloads-look-like/

CDH4 HA 切换时间

blocksize:35M filesize 96M zk-session-timeout:10s logs: active nn:Wed Sep  5 13:20:25 CST 2012 zk: [zk: localhost:2181(CONNECTED) 19] get /hadoop-ha/mycluster/ActiveStandbyElectorLock myclusternn1bd10 \ufffdF(\ufffd> cZxid = 0xd90 ctime = Wed Sep 05 13:20:58 CST 2012 mZxid = 0xd90 mtime = W ...

CDH4 HA 切换

 
HA 切换问题 切换时间太长。。。 copy 0 ... Wed Sep  5 10:30:01 CST 2012 copy 1 ... Wed Sep  5 10:30:18 CST 2012 copy 2 ... Wed Sep  5 10:30:57 CST 2012 12/09/05 10:47:24 WARN retry.RetryInvocationHandler: Exception while invoking addBlock of class ClientNamenodeProtocolTranslatorPB. Trying to fail over immediat ...
根据日志: StandBy NN启动过程 1.获得Active NN Checkpoints信息 2.在内存中,注册Live Nodes 3.SB NN 进入Safe Mode 4.从Datanod获取包信息 5.离开Safe Mode Checkpointing active NN at bigdata-4:50070 Serving checkpoints at bigdata-3/172.16.206.206:50070 2012-08-02 11:07:24,761 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.r ...
环境:        写入数据时,active node被kill掉 分析:       与Active连接断开,Active没有返回Response,此异常,需要捕获并处理,可以添加休眠,以便Standby切换成 Active 日志: 2012-08-02 10:50:28,961 WARN  ipc.Client (Client.java:run(787)) - Unexpected error reading responses on connection Thread[IPC Client (591210723) connection to bigdata-4/172.16.206 ...

CDH4 HA test

场景:       NN HA 设置成功,HA切换客户端出现异常, 错误分析       用户执行Shell脚本问题 日志: 客户端 2012-08-01 14:37:07,798 WARN  ipc.Client (Client.java:run(787)) - Unexpected error reading responses on connection Thread[IPC Client (1333933549) connection to bigdata-3/172.16.206.206:9000 from peter,5,main] java.lang.NullPointerEx ...

Hadoop TextOutput

 
TextOutputFormat 分隔符参数: mapreduce.output.textoutputformat.separator
StreamXmlRecordReader 设置属性 stream.recordreader.class=org.apache.hadoop.streaming.StreamXmlRecordReader 详情参考http://mahout.apache.org/ XMLInputFormat
NLineInputFormat 重写了splits 设置参数       mapre duce.input.lineinputformat.linespermap 应用场景       如创建了一个数据源文件,每个Map处理一行,连接不同的数据库       Reduce数量设置成0,是一个Map Only任务
key/value 分割符 mapreduce.input.keyvaluelinerecordreader.key.value.separator

Hadoop 控制split尺寸

 
三个参数决定Map的Split尺寸 1.mapred.min.split.size 2.mapred.max.split.size 3.dfs.block.size 根据公式:                max(minimumSize,min(maximumSize,blockSize)) 默认情况:                minimumSize < blockSize < maximumSize 例子:    min    max    block    split      1M     100M  64M      64M    128M   512M  64 ...
Setting up Disks for Hadoop Here are some recommendations for setting up disks in a Hadoop cluster. What we have here is anecdotal -hard evidence is very welcome, and everyone should expect a bit of trial and error work. Key Points Goals for a Hadoop cluster are normally massive amounts of data wi ...
Compatibility   When moving from one release to another you need to consider the upgrade steps that are needed consider.   1.API compatibility   2.Data compatibility   3.Wire compatibility
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-project-dist/hadoop-common/DeprecatedProperties.html  
Global site tag (gtag.js) - Google Analytics