Installing two Nvidia Titan X Pascal GPUs in Ubuntu Mate

A_FOLLOWER_TO · 23 January 2017 08:40

Dear all,

I am quite new to Ubuntu, which is why I would be thankful for any help.

After the installation of Ubuntu Mate 16.04 LTS I've figured that my two Titan X Pascal have not been recognized by Mate. The standard driver "Nouveau" was still active (according to the "Software & Updates" window) after the installation. First, I updated and upgraded the operating system using the "Welcome" functionality of Mate.

Then, I've followed this instruction and after a reboot the result in the "Software & Updates" window is as follows:

According to the Nvidia website the 375 driver is the right one for a Titan X Pascal. After the installation of the 375 driver, however, the system still does not recognize that it is a Titan X Pascal (see screenshot above). If I do interpret the output of the lspci command corretly the hardware has been recognized. A similar result holds true for the lshw output (unfortunately, i am not allowed to include more than one pic here as I am a new user).

In addition, I cannot change the standard resolution of 1024x768 in "Displays" which is kind of annoying. I have read possible solutions for this problem but I am not sure if these would work given that I am not sure if the driver is properly installed and activated.

If you need any other relevant information to get a better understanding of the situation pls let me know.

Why does not appear the GPUs' name in the "Software & Updates" window (see first picture above) accordingly but the outputs of the lspci and lshw command seem to be fine?
How can I fix the problem(s) described above?

Thanks a lot and best regards,

Alex

wolfman · 23 January 2017 09:34

Hi @A_FOLLOWER_TO,

click on the 378.09 package and then “Apply” and restart your PC and see if there are any improvements!.

A_FOLLOWER_TO · 23 January 2017 10:03

Dear @wolfman,

Thanks for your help.

Unfortunately, nothing has changed after selecting the 378 driver and a reboot. The lspci as well as the lshw outputs have also not changed. Here is the lspci output:

Does it matter on what PCIe slot I have plugged the GPUs in?
I have not used the first PCIe slot but the 3rd and the 5th. In addition, I have use a SLI bridge for the two cards.

Best
Alex

wolfman · 23 January 2017 13:50

Hi @A_FOLLOWER_TO,

try the following terminal command (Ctrl + Alt + t) and restart again:

sudo apt-get update && sudo apt-get remove nvidia-375 && sudo apt-get install nvidia-378

Have you checked to see whether or not you have the possibility to change the res with the Nvidia tool?.

In a terminal:

gksudo nvidia-settings

and check the res from there and “Save to X Configuration File” and close!.

If you don’t have it installed (it should been installed by default with the drivers):

sudo apt-get install gksu

I am not sure about where the GPU’s should be in the slots, take a look at the following link?:

https://devtalk.nvidia.com/default/topic/981774/linux/2-titan-x-pascal-on-ubuntu-16-04-/

A_FOLLOWER_TO · 23 January 2017 16:01

Hi @wolfman,

Thanks again for your help - very much appreciated.

After the update, removal of the 375 driver, the installation of 378 driver and the reboot nothing has changed. I followed your hint regarding the Nvidia X Server. You are right, the tool is already installed on my system. It looks as follows:

I am not an expert, however, it looks quite empty to me. So, I couldn't follow your instructions regarding the resolution in the Nvidia tool.

I also checked whether a xorg.conf file is stored in /etc/X11/ but there is nothing (I am not sure if that file is needed under 16.04 and relevant). But I've managed to find the /usr/share/X11/xorg.conf.d/ folder containing several (config) files plus a folder.

What does this mean/imply?

Cheers!
Alex

ouroumov · 23 January 2017 16:26

A while back I installed two TITAN X into a compute server.
Now, the use case is clearly not the same, but the instructions I followed back then were the ones from NVIDIA about installing CUDA: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
Hope it helps.

DaveB · 23 January 2017 18:19

Hi @ouroumov,
If helpful, CUDA is now included in the drivers, and thankfully no longer (unless wishing to develop CUDA applications) needs to be installed separately.

DaveB · 23 January 2017 18:30

Hi @A_FOLLOWER_TO,

Will this system be used for CUDA GPGPU (Blender Cycles rendering for instance)? If yes, removing SLI link will offer improved GPGPU performance.

Is this a headless system (no display device attached directly to a NVIDIA GPUs) instead trying to use Prime through on-board Intel GPU?

A_FOLLOWER_TO · 23 January 2017 18:49

Dear all,

First of all thanks a lot for all your help and support.

I am currently trying to correctly follow the installation guidance referenced by @ouroumov . It is quite comprehensive and I hope this will sort everything out.

@DaveB: Thanks for your hint regarding the SLI bridge and CUDA. I actually would like to use CUDA going forward. However, I am also new in this area. As a mathematician I am interested in the theoretical background of neural networks and the applications in math. The system, where I would like to install Mate, is my (high-end) PC which is based on Nvidia’s DevBox. Yes, I have read that a SLI bridge can be counter-productive for deep learning. I just used it for testing and benchmarking purposes. I am going to remove it as soon as the actual work starts.

I will post again as soon as I am sure that I considered all the steps.

Best
Alex

wolfman · 23 January 2017 18:59

Hi @A_FOLLOWER_TO,

your nvidia config settings are indeed lacking?, mine has an awful lot more, I hope the others can help you further!.

A_FOLLOWER_TO · 24 January 2017 14:28

Dear all,

Unfortunately, Nvidia’s description how to install CUDA did not sort the issue out (I followed the package manager approach).

In addition, I just plugged in one Titan X Pascal in the first slot (as recommended by the mainboard manufacturer) but I ended up with the same result.

I am a bit clueless what to do next. Any suggestions?

Best regards
Alex

wolfman · 24 January 2017 19:50

Hi @A_FOLLOWER_TO,

try updating (and force any missing dependencies) then removing the Nvidia tool, and re-install it?.

Open a terminal with Ctrl + Alt + t and run the command below (it is a single command for ease of use!):

sudo apt-get update && sudo apt-get dist-upgrade -f && sudo apt-get remove gksu && sudo apt-get install gksu

RESTART AFTER RUNNING THE ABOVE COMMAND!.

A_FOLLOWER_TO · 25 January 2017 11:26

Dear all,

A big thanks to all of you, in particular, to @wolfman.
I really appreciate your help and support.

I just solved the issue by double-checking line-by-line the outputs of the driver installation. I figured that during the installation process one inconspicuous warning message “failed to request new MokSB state” has been displayed. This led me to the root-cause of the described issues - the “Secure Boot” mode. Strange enough, the secure boot mode has been seemingly deactivated by Ubunutu during the driver installation but somehow it apparently failed doing so.

After the deactivation of the “Secure Boot Mode” via the BIOS and a re-installation of the drivers everything works fine.

Best regards
Alex