VAX 4000-600A: 2. Работа с ошибками, бутовая консоль |
Здравствуйте, гость ( Вход | Регистрация )
VAX 4000-600A: 2. Работа с ошибками, бутовая консоль |
SuperMax |
23.5.2015, 18:19
Сообщение
#1
|
Администратор Группа: Root Admin Сообщений: 6 295 Регистрация: 7.1.2006 Из: Красноярск Пользователь №: 1 |
Итак, ревизия сервера проведена - включаем
Код KA691-A V2.3, VMB 2.14 Performing normal system tests. 70..69..68..67..66..65..64..63..62..61..60..59..58..57..56..55.. 54..53..52..51..50..49..48..47..46..45..44..43..42..41..40..39.. 38..37..36..35..34..33..32..31..30..29..28..27..26..25..24..23.. ? Test_Subtest_40_06 Loop_Subtest=00 Err_Type=FF DE_Memory_count_pages.lis Vec=0000 Prev_Errs=0000 P1=00000001 P2=00000001 P3=7FFFFFFF P4=00000000 P5=00004000 P6=00040000 P7=00004000 P8=00000000 P9=00000000 P10=00000000 r0=00000002 r1=00000001 r2=00000000 r3=FFFFFFFF r4=000000A0 r5=00000000 r6=00000000 r7=00000000 r8=00000000 r9=20140758 r10=00000000 r11=2014044B dser=0000 cesr=00000000 icsr=01 pcsts=F800 pcctl=FC13 cctl=00000021 bcetsts=0000 bcedsts=0000 cefsts=00000200 nests=00 mmcdsr=01111000 mesr=00085000 Normal operation not possible. >>> осмотримся в загрузочной консоли вспоминаем RT-11 и набираем Код >>>help Following is a brief summary of all the commands supported by the console: UPPERCASE denotes a keyword that you must type in | denotes an OR condition [] denotes optional parameters <> denotes a field specifying a syntactically correct value .. denotes one of an inclusive range of integers ... denotes that the previous item may be repeated Valid qualifiers: /B /W /L /Q /INSTRUCTION /G /I /V /P /M /STEP: /N: /NOT /WRONG /U Valid commands: BOOT [[/R5:]<boot_flags>] [<boot_device>] CONFIGURE CONTINUE DEPOSIT [<qualifiers>] <address> <datum> [<datum>...] EXAMINE [<qualifiers>] [<address>] FIND [/MEMORY | /RPB] HALT HELP INITIALIZE LOGIN MOVE [<qualifiers>] <address> <address> NEXT [<count>] REPEAT <command> SEARCH [<qualifiers>] <address> <pattern> [<mask>] SET BFLG <boot_flags> SET BOOT <boot_device> SET CONTROLP <0..1 | DISABLED | ENABLED> SET DSSI_ID <bus_number> <id> ! Type F for ID to use BUS ID plugs SET HALT <0..4 | DEFAULT | RESTART | REBOOT | HALT | RESTART_REBOOT> SET HOST/DUP/DSSI/BUS:<0..3> <node_number> [<task>] SET HOST/DUP/UQSSP </DISK | /TAPE> <controller_number> [<task>] SET HOST/DUP/UQSSP <physical_CSR_address> [<task>] SET HOST/MAINTENANCE/UQSSP/SERVICE <controller_number> SET HOST/MAINTENANCE/UQSSP <physical_CSR_address> SET LANGUAGE <1..15> SET PSE <0..1 | DISABLED | ENABLED> SET PSWD SET RECALL <0..1 | DISABLED | ENABLED> SHOW BFLG SHOW BOOT SHOW CONFIG SHOW CONTROLP SHOW DEVICE SHOW DSSI [0..3] SHOW DSSI_ID SHOW ERRORS SHOW ETHERNET SHOW HALT SHOW LANGUAGE SHOW MEMORY [/FULL] SHOW PSE SHOW QBUS SHOW RECALL SHOW RLV12 SHOW SAVED_STATE SHOW SCSI SHOW TESTS SHOW TRANSLATION <physical_address> SHOW UQSSP SHOW VERSION START <address> TEST <test_code> [<parameters>] UNJAM X <address> <count> >>> отлично! Осмотримся: Код >>>SHOW VERSION KA691-A V2.3, VMB 2.14 >>>SHOW DEVICE DSSI Bus 0 Node 0 (SYSTEM) -DIA0 (RF36) DSSI Bus 0 Node 1 (USER) -DIA1 (RF36) DSSI Bus 0 Node 2 (TROLL2) -DIA2 (RF36) DSSI Bus 0 Node 3 (TROLL3) -DIA3 (RF36) DSSI Bus 0 Node 4 (TROLL4) -DIA4 (RF36) DSSI Bus 0 Node 6 (T88ZQ0) -MIA6 (TF85) DSSI Bus 0 Node 7 (*) DSSI Bus 1 Node 7 (*) SCSI Adapter 0 (761400), SCSI ID 7 Ethernet Adapter -EZA0 (08-00-2B-3E-98-E9) посмотрим что у нас на шине Код >>>SHOW QBUS Scan of Qbus I/O Space -20000120 (760440) = 1F80 DHQ11/DHV11/CXA16/CXB16/CXY08 -20000122 (760442) = F081 -20000124 (760444) = DD18 -20000126 (760446) = 0000 -20000128 (760450) = 0000 -2000012A (760452) = 0000 -2000012C (760454) = 8000 -2000012E (760456) = 0000 -20000130 (760460) = 1F80 DHQ11/DHV11/CXA16/CXB16/CXY08 -20000132 (760462) = F081 -20000134 (760464) = DD18 -20000136 (760466) = 0000 -20000138 (760470) = 0000 -2000013A (760472) = 0000 -2000013C (760474) = 8000 -2000013E (760476) = 0000 -200001E0 (760740) = 0002 DSV11 -200001E2 (760742) = 0005 -200001E4 (760744) = AAAA -200001E6 (760746) = 0000 -200001E8 (760750) = 0002 DSV11 -200001EA (760752) = 0005 -200001EC (760754) = AAAA -200001EE (760756) = 0000 -20000300 (761400) = 0000 KZQSA -20000302 (761402) = 0000 -20000304 (761404) = 0000 -20000306 (761406) = 0000 -20000308 (761410) = 0007 -2000030A (761412) = 0000 -2000030C (761414) = 0000 -2000030E (761416) = 0000 -20000310 (761420) = 0000 -20000312 (761422) = 0000 -20000314 (761424) = 0000 -20000316 (761426) = 0000 -20000318 (761430) = 0000 -2000031A (761432) = 0000 -2000031C (761434) = 0000 -2000031E (761436) = 0000 -20000320 (761440) = 0000 -20000322 (761442) = 0000 -20000324 (761444) = 0400 -20000326 (761446) = 1004 -20000328 (761450) = 0000 -2000032A (761452) = 0000 -2000032C (761454) = 0000 -2000032E (761456) = 5049 -20000330 (761460) = 8420 -20000332 (761462) = 0000 -20000334 (761464) = 0100 -20000336 (761466) = 0080 -20000338 (761470) = 1E00 -2000033A (761472) = FFFF -2000033C (761474) = 0000 -2000033E (761476) = 14C0 -20000380 (761600) = 0000 DEFQA -20000382 (761602) = 0000 -20000384 (761604) = 0000 -20000386 (761606) = 0000 -20000388 (761610) = 0000 -2000038A (761612) = 0000 -2000038C (761614) = 0000 -2000038E (761616) = 0000 -20000390 (761620) = 0000 -20000392 (761622) = 0000 -20000394 (761624) = 0200 -20000396 (761626) = 0000 -20000398 (761630) = 00D2 -2000039A (761632) = 0000 -2000039C (761634) = 0000 -2000039E (761636) = 0000 -200003A0 (761640) = 0000 -200003A2 (761642) = 0000 -200003A4 (761644) = 0000 -200003A6 (761646) = 0000 -200003A8 (761650) = 0000 -200003AA (761652) = 0000 -200003AC (761654) = 0000 -200003AE (761656) = 0000 -200003B0 (761660) = 0000 -200003B2 (761662) = 0000 -200003B4 (761664) = 0000 -200003B6 (761666) = 0000 -200003B8 (761670) = 0000 -200003BA (761672) = 0000 -200003BC (761674) = 0000 -200003BE (761676) = 0000 -20001F40 (777500) = 0020 IPCR Scan of Qbus Memory Space -30040000 to 3005FFFF (01000000 to 01377777) >>>SHOW PSE Disabled Конфигурация железа нам понятна Надо что-то сделать с ошибками для начала посмотрим какие вообще тесты есть: Код >>>show test Test # Address Name Parameters ___________________________________________________________________________ 20051600 SCB 20054C3C De_executive 30 20069924 Memory_Init_Bitmap *** mark_Hard_SBEs ****** 31 20069F1C Memory_Setup_CSRs ********* 32 2005C530 NMC_registers ********** 33 2005C70C NMC_powerup ** 34 2005DAC0 SSC_ROM *** 35 2005EF78 B_Cache_diag_mode bypass_test_mask ********* 37 20060978 Cache_w_Memory bypass_test_mask ********* 3F 2006B954 Mem_FDM_Addr_shorts *** cont_on_err ****** 40 2006A9DC Memory_count_pages First_board Last_bd Soft_errs_allowed ******* 41 200680D4 Board_Reset * 42 20060C68 Chk_for_Interrupts ********** 46 200604AC P_Cache_diag_mode bypass_test_mask ********* 47 2006A0EC Memory_Refresh start_a end incr cont_on_err time_seconds ***** 48 2006A410 Memory_Addr_shorts start_add end_add * cont_on_err pat2 pat3 **** 49 2006B2E0 Memory_FDM *** cont_on_err ****** 4A 20069624 Memory_ECC_SBEs start_add end_add add_incr cont_on_err ****** 4B 200687F4 Memory_Byte_Errors start_add end_add add_incr cont_on_err ****** 4C 20068F88 Memory_ECC_Logic start_add end_add add_incr cont_on_err ****** 4D 200683D0 Memory_Address start_add end_add add_incr cont_on_err ****** 4E 20068570 Memory_Byte start_add end_add add_incr cont_on_err ****** 4F 2006ABFC Memory_Data start_add end_add add_incr cont_on_err ****** 51 20057424 FPA ********** 52 20057918 SSC_Prog_timers which_timer wait_time_us *** 53 20057C00 SSC_TOY_Clock repeat_test_250ms_ea Tolerance *** 54 20057000 Virtual_Mode ******** 55 20058254 Interval_Timer ***** 56 20064464 SHAC_LPBCK From_bus To_bus passes ******* 58 2006510C SHAC_RESET dssi_bus port_number time_secs not_pres * 59 20061B60 SGEC_LPBCK_ASSIST time_secs ** 5C 200620F8 SHAC SHAC_number ********* 5F 20060DA0 SGEC loopback_type no_ram_tests ****** 60 2006D928 SSC_Console_SLU start_BAUD end_BAUD ******* 62 20057F04 console_QDSS mark_not_present selftest_r0 selftest_r1 ***** 63 2005808C QDSS_any input_csr selftest_r0 selftest_r1 ****** 80 2005C7A8 CQBIC_memory bypass_test_mask ********* 81 20058AB4 Qbus_MSCP IP_csr ****** 82 20058C94 Qbus_DELQA device_num_addr **** 83 20059C44 QZA_Intlpbck1 controller_number ******** 84 2005B304 QZA_Intlpbck2 controller_number ********* 85 20058E84 QZA_memory incr test_pattern controller_number ******* 86 2005932C QZA_DMA Controller_number main_mem_buf ******** 90 2005787C CQBIC_registers * 91 200577F8 CQBIC_powerup ** 99 2005D034 Flush_Ena_Caches dis_flush_VIC dis_flush_BC dis_flush_PC 9A 20063398 INTERACTION pass_count disable_device ******* 9B 20068230 Init_memory ** 9C 20065704 List_CPU_registers * 9D 2006C280 Utility Flags ********* 9E 20058424 List_diagnostics script_number * 9F 200675B4 Create_A0_Script ********** C1 20056C70 SSC_RAM_Data * C2 20056E60 SSC_RAM_Data_Addr * C5 20057DD0 SSC_registers * D0 20060058 V_Cache_diag_mode bypass_test_mask ********* D2 2005D278 O_Bit_diag_mode bypass_test_mask ********* DA 20060784 PB_Flush_Cache ********** DB 2005DC38 Speed print_speed ********* DC 2006C208 NO_Memory_present ******** DD 2005E4C4 B_Cache_Data_debug start_add end_add add_incr ******* DE 2005E04C B_Cache_Tag_Debug start_add end_add add_incr ******* DF 2005D690 O_BIT_DEBUG start_add end_add add_incr seg_incr ****** Scripts # Description A0 User defined scripts A1 Powerup tests, Functional Verify, continue on error, numeric countdown A3 Functional Verify, stop on error, test # announcements A4 Loop on A3 Functional Verify A6 Memory tests, mark only multiple bit errors A7 Memory tests A8 Memory acceptance tests, mark single and multi-bit errors, call A7 A9 Memory tests, stop on error B5 Extended tests, then loop Load & start system exerciser 100 Customer mode, 2 passes 101 CSSE mode, 2 passes 102 CSSE mode, continous until ^C 103 Manuf mode, continous until ^C 104 Manuf TINA mode, continous until ^C 105 Manuf mode, 2 passes 106 CSSE mode, select tests, continous until ^C 107 Manuf mode, select tests, continous until ^C просто огромный список! но мы сложностей не боимся и прогоняем все тесты и.... ошибка памяти уходит /на самом деле надо было начать с команды UNJAM и INITIALIZE , но об этом я прочитал позднее/ Код >>>SHOW MEMORY /fill/llif/full Memory board 0: 00000000 to 07FFFFFF, 128MB, 262144 good pages, 0 bad pages Total of 128MB, 262144 good pages, 0 bad pages, 160 reserved pages Memory Bitmap -07FEC000 to 07FF3FFF, 64 pages Console Scratch Area -07FF4000 to 07FF7FFF, 32 pages Qbus Map -07FF8000 to 07FFFFFF, 64 pages Scan of Bad Pages >>> посмотрим с какого диска грузился этот сервер Код >>>show boot DIA5: жаль, но этот винт мертв настроим загрузку с винта с операционной системой из поставки Код >>>SET BOOT dia0 >>>show boot DIA0: -------------------- Живы будем - Не помрем !
|
SuperMax |
24.5.2015, 21:20
Сообщение
#2
|
Администратор Группа: Root Admin Сообщений: 6 295 Регистрация: 7.1.2006 Из: Красноярск Пользователь №: 1 |
На самом деле с ошибкой все оказалось сложнее - те после RESETa она появилась снова
Код KA691-A V2.3, VMB 2.14 Performing normal system tests. 70..69..68..67..66..65..64..63..62..61..60..59..58..57..56..55.. 54..53..52..51..50..49..48..47..46..45..44..43..42..41..40..39.. 38..37..36..35..34..33..32..31..30..29..28..27..26..25..24..23.. ? Test_Subtest_40_06 Loop_Subtest=00 Err_Type=FF DE_Memory_count_pages.lis Vec=0000 Prev_Errs=0000 P1=00000001 P2=00000001 P3=7FFFFFFF P4=00000000 P5=00004000 P6=00040000 P7=00004000 P8=00000000 P9=00000000 P10=00000000 r0=00000002 r1=00000001 r2=00000000 r3=FFFFFFFF r4=000000A0 r5=00000000 r6=00000000 r7=00000000 r8=00000000 r9=20140758 r10=00000000 r11=2014044B dser=0000 cesr=00000000 icsr=01 pcsts=F800 pcctl=FC13 cctl=00000021 bcetsts=0000 bcedsts=0000 cefsts=00000200 nests=00 mmcdsr=01111000 mesr=00085000 Normal operation not possible. Пробуем инициализацию Код >>>init >>>unjam >>>test a7 9D..49..4F..4E..4D..4C..4B..4A..3F..3F..48..48..48..48..48..48.. 48..48..48..48..48..47..40.. ? Test_Subtest_40_06 Loop_Subtest=00 Err_Type=FF DE_Memory_count_pages.lis Vec=0000 Prev_Errs=0001 P1=00000001 P2=00000001 P3=00000001 P4=00000000 P5=00004000 P6=00040000 P7=00004000 P8=00000000 P9=00000000 P10=00000000 r0=00000002 r1=00000001 r2=00000000 r3=FFFFFFFF r4=000000A0 r5=00000000 r6=00000000 r7=00000000 r8=00000000 r9=20140758 r10=00000000 r11=2014044B dser=0000 cesr=00000000 icsr=01 pcsts=F800 pcctl=FC13 cctl=00000021 bcetsts=0000 bcedsts=0000 cefsts=00000200 nests=00 mmcdsr=01111000 mesr=00085000 однако..... тогда начнем с самого подозрительного теста Код 30 20069924 Memory_Init_Bitmap *** mark_Hard_SBEs ****** что же реально он делает пока установить не получилось в документации лишь упоминания о возможности передать этому тесту какие-либо параметры Код >>>test 30 >>>test a6 1 70..69..68..67..66..65..64..63..62..61..60..59..58..57..56..55.. 54..53..52..51..50..49..48..47..46..45..44..43..42..41..40..39.. 38..37..36..35..34..33..32..31..30..29..28..27..26..25..24..23.. ? Test_Subtest_40_06 Loop_Subtest=00 Err_Type=FF DE_Memory_count_pages.lis Vec=0000 Prev_Errs=0001 P1=00000001 P2=00000001 P3=7FFFFFFF P4=00000000 P5=00004000 P6=00040000 P7=00004000 P8=00000000 P9=00000000 P10=00000000 r0=00000002 r1=00000001 r2=00000000 r3=FFFFFFFF r4=000000A0 r5=00000000 r6=00000000 r7=00000000 r8=00000000 r9=20140758 r10=00000000 r11=2014044B dser=0000 cesr=00000000 icsr=01 pcsts=F800 pcctl=FC13 cctl=00000021 bcetsts=0000 bcedsts=0000 cefsts=00000200 nests=00 mmcdsr=01111000 mesr=00085000 >>>test 30 >>>test a6 9D..30..49..4F..4E..4B..4A..4C..3F..3F..48..48..48..48..48..48.. 48..48..48..4D..47..40..40..80.. >>>test a7 9D..49..4F..4E..4D..4C..4B..4A..3F..3F..48..48..48..48..48..48.. 48..48..48..48..48..47..40.. >>>test a1 70..69..68..67..66..65..64..63..62..61..60..59..58..57..56..55.. 54..53..52..51..50..49..48..47..46..45..44..43..42..41..40..39.. 38..37..36..35..34..33..32..31..30..29..28..27..26..25..24..23.. 22..21..20..19..18..17..16..15..14..13..12..11..10..09..08..07.. 06..05..04..03.. >>> помогло! итого память Код >>>sh mem /full Memory board 0: 00000000 to 07FFFFFF, 128MB, 262144 good pages, 0 bad pages Total of 128MB, 262144 good pages, 0 bad pages, 160 reserved pages Memory Bitmap -07FEC000 to 07FF3FFF, 64 pages Console Scratch Area -07FF4000 to 07FF7FFF, 32 pages Qbus Map -07FF8000 to 07FFFFFF, 64 pages Scan of Bad Pages >>> дальнейшие раскопки показали что появляются ошибки на тестах Код >>>test 37 ? Test_Subtest_37_07 Loop_Subtest=00 Err_Type=FF DE_Cache_w_Memory.lis Vec=0000 Prev_Errs=0006 P1=00000000 P2=00000000 P3=00000000 P4=00488020 P5=00000021 P6=00000000 P7=00000000 P8=00000000 P9=FFFFF800 P10=00000000 r0=00000002 r1=00000000 r2=201405E6 r3=00000037 r4=20060978 r5=20090000 r6=2005D6A6 r7=00000000 r8=FFFFFFF6 r9=20140758 r10=13000002 r11=2014044B dser=0000 cesr=00000000 icsr=01 pcsts=F800 pcctl=FC13 cctl=00000021 bcetsts=0000 bcedsts=0000 cefsts=00000200 nests=00 mmcdsr=0110EE00 mesr=00085000 >>>test 47 8 ? Test_Subtest_48_11 Loop_Subtest=00 Err_Type=FF DE_Memory_Addr_shorts.lis Vec=0054 Prev_Errs=0008 P1=00000000 P2=08000000 P3=00000301 P4=00001000 P5=AAAAAAAA P6=00000000 P7=09D1D000 P8=80085400 P9=00000000 P10=2006A88D r0=00000002 r1=010080C0 r2=21018044 r3=00001BFA r4=02000000 r5=AAAAAAAA r6=AAAAAAAA r7=00000000 r8=00000000 r9=20140758 r10=00000000 r11=2014044B EPC=2006A85D Lis_Add=044D dser=0000 cesr=00000000 icsr=01 pcsts=F800 pcctl=FC13 cctl=00000021 bcetsts=0000 bcedsts=0000 cefsts=00000200 nests=00 mmcdsr=09D1D000 mesr=80085400 mear=10402037__Add=010080DC Mem_SBE=D45 которые уходят после прогона тестов Код D0 20060058 V_Cache_diag_mode bypass_test_mask ********* D2 2005D278 O_Bit_diag_mode bypass_test_mask ********* DA 20060784 PB_Flush_Cache ********** DB 2005DC38 Speed print_speed ********* DC 2006C208 NO_Memory_present ******** DD 2005E4C4 B_Cache_Data_debug start_add end_add add_incr ******* DE 2005E04C B_Cache_Tag_Debug start_add end_add add_incr ******* DF 2005D690 O_BIT_DEBUG start_add end_add add_incr seg_incr ****** UPD 2016 после заказа новых модулей памяти ошибки ушли, видимо родной модуль был неисправен -------------------- Живы будем - Не помрем !
|
Текстовая версия | Сейчас: 24.9.2024, 2:32 |