Debugging X on UME Menlow

We've got the ubuntu-mobile booting Hardy on the menlow platform with the -vesa driver, but our goal is to get -psb booted. Currently -psb is displaying a corrupt screen and locking up the system, which makes troubleshooting a bit of a challenge...

So in hopes of helping anyone else that's troubleshooting this issue, here's some tips I've gathered off the grapevine and/or from code mining. Beware that some of this may be incorrect, but I'll update this as new info comes to light.

I'm using one of Mithrandir's images, and the B0 development board from Intel. For brevity I'll skip all the gory details of troubleshooting the various non-X issues and get to the interesting bits.

1. Boot to single-user mode: UME is configured to automatically boot X, thus locking up the system. To avoid this and get to a console, add this kernel boot option:

init=/bin/bash

2. (Optional) After booting in to single user mode, I like to make the startup stuff a bit more developer friendly.

mount /dev/sda1 /boot
edit /boot/grub/menu.lst
- enable the menu
- change timeout from 3 to 15 sec
- set the failsafe kernel to init=/bin/bash
- remove 'quiet' and 'splash' options

3. While in single user mode, change the default xorg configuration to vesa instead of psb

cp /etc/X11/xorg-crownbeach.conf /etc/X11/xorg-orig.conf
chmod o+w /etc/X11/xorg*.conf
edit /etc/X11/xorg-crownbeach.conf
- change "psb" to "vesa"

4. Disable the X autostartup

mkdir /etc/event.d-old/
move /etc/event.d/session /etc/event.d-old

(I like to also set the root password at this point, but YMMV)

5. Restart. At this point, rather than starting X, it should leave you at a console without a login prompt. Switch to a different tty (ctrl-alt-F2) and login.

6. To stop/start X:

pkill -9 startx

su -l ume -c "/usr/bin/startx -- -config /etc/X11/xorg-crownbeach.conf"

7. To see what driver is loaded:

grep LoadModule -A 3 LoadModule /var/log/Xorg.0.log

8. To test -psb (and lock up the system):

cd /etc/X11
su -l ume -c "/usr/bin/startx -- -config xorg-new.conf"

Now for some thoughts on where to go for testing from here:

* The -psb package I posted earlier contains a hack to make it work on Gutsy, by copying libexa.so to /usr/lib/xorg/modules/. So a first question is, is the system using the libexa.so from -psb, or the one from xserver?

It should be possible to force reinstallation of the xorg-server via:

apt-get --reinstall install xserver-xorg-core=2:1.4.1~git20080105-1ubuntu1

And similarly for reinstalling xserver-xorg-video-psb. Check the libexa.so date and size in each case to see if there is any variance, and try booting X with "psb" enabled in each case. If this turns out to be the issue, we can roll a -psb without the libexa copy hack (see the FIXME in -psb).

(Unfortunately, amit says networking is broken in this image right now, so I've been unable to try this).

* Run the xserver under gdm while ssh'd into the box remotely, and try getting a backtrace. (Again, this requires a working network connection). See DebuggingXorg for details, particularly the "DRI/drm problems" section.

* fdo has some additional tips for getting debug output from the X server with no network connection.

* Vary xorg.conf parameters, on off chance we can either get more detailed error output, or bypass the issue. I'll focus on this first since it doesn't require networking.

* The -psb packages apparently work on a moblin kernel. It would be interesting to examine any differences between that set up, and one on a ubuntu-mobile kernel, particularly if, for instance, it's a libdrm incompatibility.

Posted in Submitted by bryce on Tue, 2008-01-15 22:49.
bryce's blog

Thanks amit,

Sounds like something's fishy with that second lock. I disabled it in this patch, and lo and behold, up came X.

Looking at the Xorg.0.log, it is definitely running with the psb driver, and the log contains nothing unexpected. (There are some error and warning messages, but nothing we don't already know about.)

So, since this seems to solve the issue (at least for the present), I've uploaded the fixed -psb to Ubuntu Mobile PPA.

I played around with some apps, and everything seems to be working fine. 3D performance is poor, but that's expected since we don't have the 3D driver yet. However, glxgears runs, it's just slow (10.5 fps).

">

">

bryce | Thu, 2008-01-17 20:20

# echo 1 > /sys/module/drm/parameters/debug

gives us more insight of what is happening inside the kernel...

[drm:drm_stub_open]
[drm:drm_open_helper] pid = 4753, minor = 0
[drm:drm_addmap_core] offset = 0x00000000, size = 0x00002000, type = 2
[drm:drm_addmap_core] 8192 13 f8a37000
[drm:drm_setup]
[drm:drm_unlocked_ioctl] pid=4753, cmd=0xc0246400, nr=0x00, dev 0xe200, auth=1
[drm:drm_unlocked_ioctl] pid=4753, cmd=0xc0246400, nr=0x00, dev 0xe200, auth=1
[drm:drm_release] open_count = 1
[drm:drm_release] pid = 4753, device = 0xe200, open_count = 1
[drm:drm_fasync] fd = -1, device = 0xe200
[drm:drm_lastclose]
[drm:drm_lastclose] driver lastclose completed
[drm:drm_lastclose] lastclose completed
[drm:drm_stub_open]
[drm:drm_open_helper] pid = 4753, minor = 0
[drm:drm_addmap_core] offset = 0x00000000, size = 0x00002000, type = 2
[drm:drm_addmap_core] 8192 13 f8a37000
[drm:drm_setup]
[drm:drm_unlocked_ioctl] pid=4753, cmd=0xc0246400, nr=0x00, dev 0xe200, auth=1
[drm:drm_unlocked_ioctl] pid=4753, cmd=0xc0246400, nr=0x00, dev 0xe200, auth=1
[drm:drm_release] open_count = 1
[drm:drm_release] pid = 4753, device = 0xe200, open_count = 1
[drm:drm_fasync] fd = -1, device = 0xe200
[drm:drm_lastclose]
[drm:drm_lastclose] driver lastclose completed
[drm:drm_lastclose] lastclose completed
[drm:drm_stub_open]
[drm:drm_open_helper] pid = 4753, minor = 0
[drm:drm_addmap_core] offset = 0x00000000, size = 0x00002000, type = 2
[drm:drm_addmap_core] 8192 13 f8a37000
[drm:drm_setup]
[drm:drm_unlocked_ioctl] pid=4753, cmd=0xc0106407, nr=0x07, dev 0xe200, auth=1
[drm:drm_unlocked_ioctl] pid=4753, cmd=0xc0086401, nr=0x01, dev 0xe200, auth=1
[drm:drm_unlocked_ioctl] pid=4753, cmd=0xc0086401, nr=0x01, dev 0xe200, auth=1
[drm:drm_unlocked_ioctl] pid=4753, cmd=0xc0106407, nr=0x07, dev 0xe200, auth=1
[drm:drm_unlocked_ioctl] pid=4753, cmd=0xc0186415, nr=0x15, dev 0xe200, auth=1
[drm:drm_addmap_core] offset = 0x00000000, size = 0x00002000, type = 2
[drm:drm_mmap_locked] start = 0xb7b39000, end = 0xb7b3b000, page offset = 0xf8a37
[drm:drm_vm_open_locked] 0xb7b39000,0x00002000
[drm:drm_do_vm_shm_nopage] shm_nopage 0xb7b39000
[drm:drm_do_vm_shm_nopage] shm_nopage 0xb7b3a000
[drm:drm_unlocked_ioctl] pid=4753, cmd=0xc0186415, nr=0x15, dev 0xe200, auth=1
[drm:drm_addmap_core] offset = 0x3fc00000, size = 0x00001000, type = 0
[drm:drm_unlocked_ioctl] pid=4753, cmd=0xc0086426, nr=0x26, dev 0xe200, auth=1
[drm:drm_unlocked_ioctl] pid=4753, cmd=0xc0086426, nr=0x26, dev 0xe200, auth=1
[drm:drm_unlocked_ioctl] pid=4753, cmd=0xc0086420, nr=0x20, dev 0xe200, auth=1
[drm:drm_addctx] 1
[drm:drm_unlocked_ioctl] pid=4753, cmd=0x40086422, nr=0x22, dev 0xe200, auth=1
[drm:drm_unlocked_ioctl] pid=4753, cmd=0x4008642a, nr=0x2a, dev 0xe200, auth=1
[drm:drm_lock] 1 (pid 4753) requests lock (0x00000000), flags = 0x00000000
[drm:drm_lock] 1 has lock

This first context (whatever it is) has a successful lock here.

[drm:drm_fasync] fd = 10, device = 0xe200
[drm:drm_unlocked_ioctl] pid=4753, cmd=0xc0246400, nr=0x00, dev 0xe200, auth=1
[drm:drm_unlocked_ioctl] pid=4753, cmd=0xc0246400, nr=0x00, dev 0xe200, auth=1
[drm:drm_unlocked_ioctl] pid=4753, cmd=0x800c64d6, nr=0xd6, dev 0xe200, auth=1
[drm:drm_unlocked_ioctl] pid=4753, cmd=0xc0086420, nr=0x20, dev 0xe200, auth=1
[drm:drm_addctx] 2
[drm:drm_unlocked_ioctl] pid=4753, cmd=0xc0106403, nr=0x03, dev 0xe200, auth=1
[drm:drm_irq_by_busid] 0:2:0 => IRQ 17
[drm:drm_unlocked_ioctl] pid=4753, cmd=0x40086414, nr=0x14, dev 0xe200, auth=1
[drm:drm_irq_install] drm_irq_install: irq=17
[psb:0x01:psb_irq_postinstall] Setting up MSVDX IRQs.....
[drm:drm_unlocked_ioctl] pid=4753, cmd=0x4008642a, nr=0x2a, dev 0xe200, auth=1
[drm:drm_lock] 2 (pid 4753) requests lock (0x80000001), flags = 0x00000000
[drm:drm_lock] 2 interrupted

This second lock failed here. This is where we Ctrl-C'ed the X server running under gdb. Now to figure out what this second lock is....

[drm:drm_unlocked_ioctl] ret = -512
[drm:drm_vm_shm_close] 0xb7b39000,0x00002000
[drm:drm_release] open_count = 1
[drm:drm_release] pid = 4753, device = 0xe200, open_count = 1
[drm:drm_release] File f7762a80 released, freeing lock for context 1
[drm:drm_fasync] fd = -1, device = 0xe200
[drm:drm_lastclose]
[drm:drm_lastclose] driver lastclose completed
[drm:drm_irq_uninstall] drm_irq_uninstall: irq=17
[drm:drm_lastclose] lastclose completed

amitk | Thu, 2008-01-17 15:54

Aha! This shows where things are failing:

(gdb) bt
#0 0xb7fd440e in __kernel_vsyscall ()
#1 0xb7e25bc9 in ioctl () from /lib/libc.so.6
#2 0xb7bc4a1d in drmGetLock (fd=10, context=2, flags=3213662516) at xf86drm.c:1259
#3 0xb7bb951b in psbDRILock (pScrn=0x820bcd0, flags=0) at psb_dri.c:724
#4 0xb7baf983 in psbScreenInit (scrnIndex=0, pScreen=0x821b2d8, argc=6, argv=0xbf8c9b74) at psb_driver.c:1342
#5 0x08073ddb in AddScreen (pfnInit=0xb7baf7f0 , argc=6, argv=0xbf8c9b74) at ../../dix/main.c:769
#6 0x080a6e39 in InitOutput (pScreenInfo=0x81fa640, argc=6, argv=0xbf8c9b74) at ../../../../hw/xfree86/common/xf86Init.c:850
#7 0x08074590 in main (argc=6, argv=0xbf8c9b74, envp=Cannot access memory at address 0x40086432
) at ../../dix/main.c:369

Looking at the drmGetLock() routine in libdrm, what's happening is it's getting stuck in a loop calling ioctl(), which is not succeeding for some reason:

int drmGetLock(int fd, drm_context_t context, drmLockFlags flags)
{
...
while (ioctl(fd, DRM_IOCTL_LOCK, &lock))
;
return 0;
}

So at this point, it's over to amitk to wrestle down where in the kernel things are blowing up.

bryce | Thu, 2008-01-17 00:23

Thanks to amitk's new kernel update and Mithrandir's new image, we finally have networking.

Here's a backtrace of running gdb against xinit:

cd /etc/X11
xinit /etc/X11/xinit/xinitrc -- /usr/bin/X -config xorg-new.conf
...
(==) Log file: "/var/log/Xorg.0.log", Time: Wed Jan 16 21:00:40 2008
(++) Using config file: "/etc/X11/xorg-new.conf"
(EE) PSB: Failed to load module "Xpsb" (module does not exist, 0)

(gdb)
(gdb) bt
#0 0xb7fa1410 in __kernel_vsyscall ()
#1 0xb7d8f4e2 in sigsuspend () from /lib/libc.so.6
#2 0x08049a97 in ?? ()
#3 0xbf981238 in ?? ()
#4 0x00001142 in ?? ()
#5 0xffffffff in ?? ()
#6 0xb7eab140 in ?? () from /lib/libc.so.6
#7 0x00000000 in ?? ()

And the last several lines from Xorg.0.log:

drmOpenByBusid: drmOpenMinor returns 10
drmOpenByBusid: drmGetBusid reports pci:0000:00:02.0
(II) [drm] loaded kernel module for "psb" driver.
(II) [drm] DRM interface version 1.3
(II) [drm] DRM open master succeeded.
(II) PSB(0): [drm] Using the DRM lock SAREA also for drawables.
(II) PSB(0): [drm] framebuffer handle = 0x3fc00000
(II) PSB(0): [drm] added 1 reserved context for kernel
(II) PSB(0): X context handle = 0x1
(II) PSB(0): [drm] installed DRM signal handler
(II) PSB(0): [drm] Allocated (legacy) device DRM context 2.
(II) [drm] Irq handler installed for IRQ 17.
(II) PSB(0): Creating memory manager.
(II) PSB(0): Debug: serverGeneration
(II) PSB(0): Locking DRI for screen

bryce | Wed, 2008-01-16 21:16

Looking deeper into why the -psb code would be failing in the memory manager, I noticed that the package was not building libmm at all. Hmm!

I've fixed that up and uploaded a new package:

http://people.ubuntu.com/~bryce/Testing/xserver-xorg-video-psb/

This also includes some debugging stuff (necessary only if my theory about libmm is incorrect), drops the Gutsy-specific EXA 2.2 workaround, and adopts the ubuntu-mobile numbering style.

Next step is getting a lpia build of it.

bryce | Wed, 2008-01-16 05:01

The above line is the last recorded into /var/log/Xorg.0.log. Looking at the driver source, this is printed from function psbDRMIrqInit, which is called from three places:

* psbDeviceLegacyDRIInit() in psb_dvi.c
* psbDRMDeviceInit() in psb_dvi.c
* psbEnterVt() in psb_driver.c

Looking at the surrounding X_INFO messages, the third call doesn't match up. Actually the first call appears to be the best match, although I'm not certain.

In any case, assuming it is psbDeviceLegacyDRIInit() where things are getting stuck, the next executable logic following the IRQ message deals with creating a memory manager:

pDevice->man = mmCreateDRM(pDevice->drmFD);

There's some conditional logic that I'm not sure whether it's getting called. Eventually, the next debug message appears to be in the shadowfb memory allocation section:

PSB_DEBUG(scrnIndex, 3, "Shadow\n");

If not the memory manager creation, then potentially it's getting stuck in xf86InitialConfiguration(), the scanout logic, or the mi layer logic. Unless we can get a backtrace, we'll need to add more debug statements I guess.

bryce | Wed, 2008-01-16 02:26

mjg59, offered this tip:

bryce: Mount / -o sync
That way you'll get more xorg.log

Since UME uses squashfs, /etc/fstab is not used for mount options. So instead, I guessed that the sync option needs added to /etc/initramfs-tools/disk:

mount -o rw,noatime,nodiratime,sync /dev/${device}2 /persistmnt

Unfortunately, this seems to not have made a difference to the log. Either it isn't sync'ing, or no further log data was available.

bryce | Wed, 2008-01-16 02:03

Turning NoAccel, SWcursor, and IgnoreACPI on had no effect on the lockup.

Neither did ShadowFB.

bryce | Wed, 2008-01-16 01:16

Here are the options supported by this driver:

static const OptionInfoRec psbOptions[] = {
{OPTION_SHADOWFB, "ShadowFB", OPTV_BOOLEAN, {0}, FALSE},
{OPTION_NOACCEL, "NoAccel", OPTV_BOOLEAN, {0}, FALSE},
{OPTION_SWCURSOR, "SWcursor", OPTV_BOOLEAN, {0}, FALSE},
{OPTION_EXAMEM, "ExaMem", OPTV_INTEGER, {0}, FALSE},
{OPTION_EXASCRATCH, "ExaScratch", OPTV_INTEGER, {0}, FALSE},
{OPTION_EXACACHED, "ExaCached", OPTV_BOOLEAN, {0}, FALSE},
{OPTION_DRI, "DRI", OPTV_BOOLEAN, {0}, FALSE},
{OPTION_IGNORE_ACPI,"IgnoreACPI", OPTV_BOOLEAN, {0}, FALSE},
{OPTION_NOPANEL, "NoPanel", OPTV_BOOLEAN, {0}, FALSE},
{OPTION_LIDTIMER, "LidTimer", OPTV_BOOLEAN, {0}, FALSE},
{OPTION_NOFITTING, "NoFitting", OPTV_BOOLEAN, {0}, FALSE},
{-1, NULL, OPTV_NONE, {0}, FALSE},
};

bryce | Wed, 2008-01-16 00:57

I'm running through a list of various troubleshooting tips from X/Debugging. Many of these are probably not even relevant for -psb, but just wanted to run through and try them each at least once to see if they reveal any clues:

1. Attempting to unload "dri" and "glx" options in the "Module" section didn't do anything - the X log says they're required and overrides the unload request.

2. Adding this to the "Device" section:

Option "DRI" "false"

...causes X to fail to startup, with an error that it needs DRI enables for the Poulsbo driver.

3. Adding to the "Device" section:

Option "VBERestore" "false"

...had no effect.

4. Adding to "Extensions"

Option "Composite" "Disable"

...had no effect.

5. Adding to "ServerLayout"

Option "AIGLX" "false"

...had no effect.

bryce | Wed, 2008-01-16 00:41

In the Xorg.0.log after booting -psb, the exa version listed was compiled for 1.4.0.90, module version 2.2.0. Also, the date on the file matches that of other bits of xserver, and does not match the date on the -psb driver. This isn't what I'd expected, and makes me think it's not a libexa problem after all.

I notice the Xorg.0.log ends with some items about trying to load drm. Nothing that looks like an obvious error, however I'd expect some stuff to follow in the log regarding input devices, EDID info, modelines, etc.

bryce | Tue, 2008-01-15 23:59

This error is probably not an issue:

(EE) PSB: Failed to load module "Xpsb" (module does not exist, 0)
(WW) PSB(0): Poulsbo Xpsb driver not available. XVideo and 3D acceleration will not work

I believe this just indicates the 3D driver is not present (I don't think this has been delivered yet). I don't think it is related to the lockup.

bryce | Tue, 2008-01-15 23:53

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
More information about formatting options