Hive 从配置单元列提取子字符串时发生异常
我有一个列类别,其中包含如下数据 从自动更新cab提取第三方根目录列表失败,错误:数据无效 我需要在类别栏的<>符号之间选择url部分。 我写了一个蜂巢查询-Hive 从配置单元列提取子字符串时发生异常,hive,Hive,我有一个列类别,其中包含如下数据 从自动更新cab提取第三方根目录列表失败,错误:数据无效 我需要在类别栏的符号之间选择url部分。 我写了一个蜂巢查询- select level,category,regexp_extract(category,'http://[^\>]*') AS url from event where level='Error'; 我有一个例外: Total MapReduce jobs = 1 Launching Job 1 out of 1 Number o
select level,category,regexp_extract(category,'http://[^\>]*') AS url from event where level='Error';
我有一个例外:
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201406122248_0014, Tracking URL = http://0.0.0.0:50030/jobdetails.jsp?jobid=job_201406122248_0014
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201406122248_0014
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-06-13 02:13:35,696 Stage-1 map = 0%, reduce = 0%
2014-06-13 02:14:13,895 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201406122248_0014 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://0.0.0.0:50030/jobdetails.jsp?jobid=job_201406122248_0014
Examining task ID: task_201406122248_0014_m_000002 (and more) from job job_201406122248_0014
Task with the most failures(4):
-----
Task ID:
task_201406122248_0014_m_000000
URL:
http://localhost.localdomain:50030/taskdetails.jsp?jobid=job_201406122248_0014&tipid=task_201406122248_0014_m_000000
-----
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"level":"Error","datetimes":"6/13/2014 9:24:05 AM","source":"Microsoft-Windows-CAPI2","eventid":4107,"task":"None","category":"\"Failed extract of third-party root list from auto update cab at: <http://ctldl.windowsupdate.com/msdownload/update/v3/static/trustedr/en/authrootstl.cab> with error: The data is invalid."}
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
如何解决这个问题?
请提供帮助。您可以尝试清除缓存并重新运行…命令是certutil-urlcache*delete在配置单元中尝试了该命令,我得到了ParseException:无法识别“certutil”-“urlcache”附近的输入它不是配置单元命令。在Windows中,您可以从CMD运行它,但仍然会遇到相同的错误:您的意思是运行certutil命令,然后重新运行,但再次失败?如果是这样,那么值得一看。您可以尝试清理缓存的其余部分,如前面所述。