之前的一篇博客 中,遇到1000分区的分区表更新和删除都执行失败(问题3)的问题。经过简单的调查发现原因竟然是OOM导致进程被杀。以下是相关错误消息
1.错误现象
命令行( SQL执行失败):
- -bash-4.1$ pgbench -n -r -c 10 -j 10 -T 2 -f update.sql db1000
- Client 2 aborted in state 1. Probably the backend died while processing.
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- Client 3 aborted in state 1. Probably the backend died while processing.
- Client 0 aborted in state 1. Probably the backend died while processing.
- Client 9 aborted in state 1. Probably the backend died while processing.
- Client 8 aborted in state 1. Probably the backend died while processing.
- Client 7 aborted in state 1. Probably the backend died while processing.
- Client 1 aborted in state 1. Probably the backend died while processing.
- Client 4 aborted in state 1. Probably the backend died while processing.
- Client 6 aborted in state 1. Probably the backend died while processing.
- Client 5 aborted in state 1. Probably the backend died while processing.
- transaction type: Custom query
- scaling factor: 1
- query mode: simple
- number of clients: 10
- number of threads: 10
- duration: 2 s
- number of transactions actually processed: 0
- tps = 0.000000 (including connections establishing)
- tps = 0.000000 (excluding connections establishing)
- statement latencies in milliseconds:
- 0.271800 \setrandom id 1 10000000
- 0.000000 update maintb set name = 'aaaaa12345' where id=:id;
pg服务端日志(服务端进程被杀并重启):
- LOG: server process (PID 18565) was terminated by signal 9: Killed
- DETAIL: Failed process was running: update maintb set name = 'aaaaa12345' where id=150458;
- LOG: terminating any other active server processes
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- WARNING: terminating connection because of crash of another server process
- DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
- HINT: In a moment you should be able to reconnect to the database and repeat your command.
- LOG: all server processes terminated; reinitializing
- LOG: database system was interrupted; last known up at 2015-01-16 14:18:26 CST
- LOG: database system was not properly shut down; automatic recovery in progress
- LOG: redo starts at 3/4A153AC8
- LOG: record with zero length at 3/4AD01498
- LOG: redo done at 3/4AD01468
- LOG: last completed transaction was at log time 2015-01-16 14:35:11.990522+08
- LOG: database system is ready to accept connections
- LOG: autovacuum launcher started
/var/log/messages(OOM): - Jan 16 14:35:40 hanode1 kernel: Out of memory: Kill process 18565 (postgres) score 94 or sacrifice child
- Jan 16 14:35:40 hanode1 kernel: Killed process 18565, UID 26, (postgres) total-vm:608864kB, anon-rss:150480kB, file-rss:3180kB
2.PG进程究竟分配了多少内存?
通过debug PG9.4的代码,发现生成分区表的1个update语句的执行计划居然耗费了717M的内存。
sql:
- db1000=# update maintb set name='aaaaa12345' where id =3;
- UPDATE 19
src/backend/tcop/postgres.c:
- /*
- * OK to analyze, rewrite, and plan this query.
- *
- * Switch to appropriate context for constructing querytrees (again,
- * these must outlive the execution context).
- */
- oldcontext = MemoryContextSwitchTo(MessageContext);
-
- querytree_list = pg_analyze_and_rewrite(parsetree, query_string,
- NULL, 0);
-
- plantree_list = pg_plan_queries(querytree_list, 0, NULL);
-
- /* Done with the snapshot used for parsing/planning */
- if (snapshot_set)
- PopActiveSnapshot();
在上面的pg_plan_queries()调用前后分别在gdb中执行“call MemoryContextStats(MessageContext)”,从输出结果可以看出 pg_plan_queries分配了717M内存。
- MessageContext: 16384 total in 2 blocks; 9976 free (9 chunks); 6408 used
- MessageContext: 721420288 total in 96 blocks; 3715776 free (13 chunks); 717704512 used
1个客户端700多M ,10个客户端 并行执行就是7个G ,我的VM只有2G 内存,难怪被kill掉 。 3.问题在哪里?
同样的方法,测量了另外两种情况的内存消耗。
100个分区的update sql:
- db100=# update maintb set name='aaaaa12345' where id =3;
- UPDATE 8
生成执行计划消耗的内存:8M
- MessageContext: 24576 total in 2 blocks; 18168 free (9 chunks); 6408 used
- MessageContext: 8380416 total in 10 blocks; 5056 free (3 chunks); 8375360 used
从8M到700M,不难看出:分区表update的执行计划消耗内存与分区数目n是n^2的倍数关系。 进一步细化,发现关键点在下面两处内存分配。 src/backend/optimizer/plan/planner.c - static Plan *
- inheritance_planner(PlannerInfo *root)
- {
- ...
- foreach(lc, root->append_rel_list)//循环1001次
- {
- AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
- PlannerInfo subroot;
- Plan *subplan;
- Index rti;
-
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
- /*
- * We need a working copy of the PlannerInfo so that we can control
- * propagation of information back to the main copy.
- */
- memcpy(&subroot, root, sizeof(PlannerInfo));
-
- /*
- * Generate modified query with this rel as target. We first apply
- * adjust_appendrel_attrs, which copies the Query and changes
- * references to the parent RTE to refer to the current child RTE,
- * then fool around with subquery RTEs.
- */
- subroot.parse = (Query *)
- adjust_appendrel_attrs(root,
- (Node *) parse,
- appinfo);//大约分配了300K
-
- /*
- * The rowMarks list might contain references to subquery RTEs, so
- * make a copy that we can apply ChangeVarNodes to. (Fortunately, the
- * executor doesn't need to see the modified copies --- we can just
- * pass it the original rowMarks list.)
- */
- subroot.rowMarks = (List *) copyObject(root->rowMarks);
-
- /*
- * The append_rel_list likewise might contain references to subquery
- * RTEs (if any subqueries were flattenable UNION ALLs). So prepare
- * to apply ChangeVarNodes to that, too.
- */
- subroot.append_rel_list = (List *) copyObject(root->append_rel_list);//大约分配了400K
-
- /*
- * Add placeholders to the child Query's rangetable list to fill the
- * RT indexes already reserved for subqueries in previous children.
- * These won't be referenced, so there's no need to make them very
- * valid-looking.
- */
- while (list_length(subroot.parse->rtable) list_length(final_rtable))
- subroot.parse->rtable = lappend(subroot.parse->rtable,
- makeNode(RangeTblEntry));
- ...
- }
4. select为什么没有问题?
select和update走的是不同的分支,没有走上面的inheritance_planner()函数,内存消耗只有7M。 1000个分区的select sql:
- db1000=# select * from maintb where id =3;
- id | name
- ----+------------
- 3 | aaaaa12345
- 3 | aaaaa12345
- 3 | aaaaa12345
- 3 | aaaaa12345
- 3 | aaaaa12345
- 3 | aaaaa12345
- 3 | aaaaa12345
- 3 | aaaaa12345
- 3 | aaaaa12345
- 3 | aaaaa12345
- 3 | aaaaa12345
- 3 | aaaaa12345
- 3 | aaaaa12345
- 3 | aaaaa12345
- 3 | aaaaa12345
- 3 | aaaaa12345
- 3 | aaaaa12345
- 3 | aaaaa12345
- 3 | aaaaa12345
- (19 rows)
生成 执行计划消耗的内存:7M
- MessageContext: 8192 total in 1 blocks; 1864 free (1 chunks); 6328 used
- MessageContext: 8388608 total in 11 blocks; 1176776 free (6 chunks); 7211832 used
update的调用栈:
点击( 此处 )折叠或打开 - (gdb) bt
- #0 inheritance_planner (root=0x1fd6548) at planner.c:787
- #1 0x00000000005fe7c0 in subquery_planner (glob=0x1fd64b8, parse=0x1fd59e0, parent_root=value optimized out>,
- hasRecursion=value optimized out>, tuple_fraction=0, subroot=0x7fffbf4c70b8) at planner.c:569
- #2 0x00000000005fe935 in standard_planner (parse=0x1fd59e0, cursorOptions=0, boundParams=0x0) at planner.c:210
- #3 0x00000000006784fa in pg_plan_query (querytree=value optimized out>, cursorOptions=value optimized out>,
- boundParams=value optimized out>) at postgres.c:750
- #4 0x00000000006785e4 in pg_plan_queries (querytrees=value optimized out>, cursorOptions=0, boundParams=0x0)
- at postgres.c:809
- #5 0x0000000000678bcc in exec_simple_query (
- query_string=0x1fd4b30 "update maintb set name='aaaaa12345' where id =3;") at postgres.c:974
- #6 0x000000000067a179 in PostgresMain (argc=value optimized out>, argv=value optimized out>,
- dbname=0x1f70fb0 "db1000", username=value optimized out>) at postgres.c:4016
- #7 0x00000000006283cb in BackendRun (argc=value optimized out>, argv=value optimized out>) at postmaster.c:4123
- #8 BackendStartup (argc=value optimized out>, argv=value optimized out>) at postmaster.c:3797
- #9 ServerLoop (argc=value optimized out>, argv=value optimized out>) at postmaster.c:1576
- #10 PostmasterMain (argc=value optimized out>, argv=value optimized out>) at postmaster.c:1223
- #11 0x00000000005c0cd8 in main (argc=3, argv=0x1f6fa10) at main.c:227
select的调用栈:
- (gdb) bt
- #0 grouping_planner (root=0x1fd6568, tuple_fraction=0) at planner.c:1080
- #1 0x00000000005fe5b4 in subquery_planner (glob=0x1fd64d8, parse=0x1fd5b20, parent_root=value optimized out>,
- hasRecursion=value optimized out>, tuple_fraction=0, subroot=0x7fffbf4c70b8) at planner.c:572
- #2 0x00000000005fe935 in standard_planner (parse=0x1fd5b20, cursorOptions=0, boundParams=0x0) at planner.c:210
- #3 0x00000000006784fa in pg_plan_query (querytree=value optimized out>, cursorOptions=value optimized out>,
- boundParams=value optimized out>) at postgres.c:750
- #4 0x00000000006785e4 in pg_plan_queries (querytrees=value optimized out>, cursorOptions=0, boundParams=0x0)
- at postgres.c:809
- #5 0x0000000000678bcc in exec_simple_query (query_string=0x1fd4b30 "select * from maintb where id =3;")
- at postgres.c:974
- #6 0x000000000067a179 in PostgresMain (argc=value optimized out>, argv=value optimized out>,
- dbname=0x1f70fb0 "db1000", username=value optimized out>) at postgres.c:4016
- #7 0x00000000006283cb in BackendRun (argc=value optimized out>, argv=value optimized out>) at postmaster.c:4123
- #8 BackendStartup (argc=value optimized out>, argv=value optimized out>) at postmaster.c:3797
- #9 ServerLoop (argc=value optimized out>, argv=value optimized out>) at postmaster.c:1576
- #10 PostmasterMain (argc=value optimized out>, argv=value optimized out>) at postmaster.c:1223
- #11 0x00000000005c0cd8 in main (argc=3, argv=0x1f6fa10) at main.c:227
5.最后
按目前PG对分区表的处理逻辑,确实处理不了很多分区的情况。(但是update的那2处内存分配应该是可以优化的)。
6.参考
http://blog.chinaunix.net/xmlrpc.php?r=blog/article&uid=20726500&id=4732253