Overclocking NVIDIA on linux


First you need install nvidia drivers, you can download some from nvidia site or use ppa which works for me.

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-384

Next you can install extra cuda drivers.
https://developer.nvidia.com/cuda-downloads

Checkout nvidia-settings

root@min01:~/bin# nvidia-settings -c :0 -q gpus

Failed to connect to Mir: Failed to connect to server socket: No such file or directory

Unable to init server: Could not connect: Connection refused

3 GPUs on min01:0

    [0] min01:0[gpu:0] (GeForce GTX 1060 3GB)

      Has the following names:

        GPU-0

        GPU-d530dd35-f958-1a38-d147-c1ecc2b234eb

If you see
 

root@min01:~# nvidia-settings -c :0 -q gpus

Failed to connect to Mir: Failed to connect to server socket: No such file or directory

Unable to init server: Could not connect: Connection refused

No protocol specified

ERROR: Unable to find display on any available system

No protocol specified

ERROR: Unable to find display on any available system

If you need ebin for fake monitor use this command for installing and making copy of your main screen

sudo apt install read-edid edid-decode
# the integer after -m is the monitor id, starting from zero and incrementing by one.
sudo get-edid -m 0 > edid.bin
# View the output of this command and verify you have the right monitor.
# You can tell via the vendor, resolutions, serial number, all that jazz.
cat edid.bin | edid-decode

original tutorial http://kodi.wiki/view/Creating_and_using_edid.bin_via_xorg.conf

Now you can move your edid into X11 folder

cp edid.bin /etc/X11/

after this you can change X11 config

For enable overclocking you need set coolbits which control this stuff

The Coolbits value is the sum of its component bits in the binary numeral system. The component bits are:

  • 1 (bit 0) - Enables overclocking of older (pre-Fermi) cores on the Clock Frequencies page in nvidia-settings.
  • 2 (bit 1) - When this bit is set, the driver will "attempt to initialize SLI when using GPUs with different amounts of video memory".
  • 4 (bit 2) - Enables manual configuration of GPU fan speed on the Thermal Monitor page in nvidia-settings.
  • 8 (bit 3) - Enables overclocking on the PowerMizer page in nvidia-settings. Available since version 337.12 for the Fermi architecture and newer.[2]
  • 16 (bit 4) - Enables overvoltage using nvidia-settings CLI options. Available since version 346.16 for the Fermi architecture and newer.[3]

To enable multiple features, add the Coolbits values together. For example, to enable overclocking and overvoltage of Fermi cores, set Option "Coolbits" "24".

Checkout nice tips here https://wiki.archlinux.org/index.php/NVIDIA/Tips_and_tricks

# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 384.69  (buildmeister@swio-display-x86-rhel47-06)  Wed Aug 16 20:57:01 PDT 2017

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0" 0 0
    Screen      1  "Screen1" 0 0
    Screen      2  "Screen2" 0 0
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 1060 3GB"
   Option          "Coolbits" "31"
EndSection

Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 1070"
    BusID          "PCI:4:0:0"
   Option          "Coolbits" "31"
   Option          "ConnectedMonitor" "DFP-0"
   Option          "CustomEDID" "DFP-0:/etc/X11/edid.bin"
EndSection

Section "Device"
    Identifier     "Device2"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 1070"
    BusID          "PCI:5:0:0"
   Option          "Coolbits" "31"
   Option          "ConnectedMonitor" "DFP-0"
   Option          "CustomEDID" "DFP-0:/etc/X11/edid.bin"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "Coolbits" "12"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Section "Screen"
    Identifier     "Screen1"
    Device         "Device1"
    Option         "Coolbits" "31"
    Option         "UseDisplayDevice" "none"
    Option         "RegistryDwords" "PerfLevelSrc=0x2222"
EndSection

Section "Screen"
    Identifier     "Screen2"
    Device         "Device2"
    Option         "Coolbits" "31"
    Option         "UseDisplayDevice" "none"
    Option         "RegistryDwords" "PerfLevelSrc=0x2222"
EndSection

root@min01:~# service gdm stop
root@min01:~# sudo startx -- :0

Finally you can change card props like this
# Apply +850 Mhz Mem clock offset, and +100 Mhz on GPU clock
nvidia-settings -c :0 -a '[gpu:0]/GPUMemoryTransferRateOffset[3]=850'
nvidia-settings -c :0 -a '[gpu:0]/GPUGraphicsClockOffset[3]=100'


Change PowerLimit

root@min01:~/bin# nvidia-smi 

Wed Sep  6 11:29:15 2017       

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 384.69                 Driver Version: 384.69                    |

|-------------------------------+----------------------+----------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|===============================+======================+======================|

|   0  GeForce GTX 106...  Off  | 00000000:01:00.0  On |                  N/A |

| 60%   62C    P2   110W / 120W |   2378MiB /  3010MiB |    100%      Default |

+-------------------------------+----------------------+----------------------+

|   1  GeForce GTX 1070    Off  | 00000000:04:00.0 Off |                  N/A |

| 53%   78C    P2   148W / 151W |   2292MiB /  8114MiB |    100%      Default |

+-------------------------------+----------------------+----------------------+

|   2  GeForce GTX 1070    Off  | 00000000:05:00.0 Off |                  N/A |

| 52%   74C    P2   150W / 151W |   2292MiB /  8114MiB |    100%      Default |

+-------------------------------+----------------------+----------------------+

# set power limit to 100W
sudo nvidia-smi -i 00000000:01:00.0 -pl 100

 

sudo nvidia-settings -a '[gpu:0]/GPUFanControlState=1' -a '[fan:0]/GPUTargetFanSpeed=70'

 

Links

https://gist.github.com/bsodmike/369f8a202c5a5c97cfbd481264d549e9
https://www.reddit.com/r/EtherMining/comments/6gfnzi/overclocking_of_multiple_gtx_1070_cards_on_375/
https://unix.stackexchange.com/questions/367584/how-to-adjust-nvidia-gpu-fan-speed-on-a-headless-node
https://pastebin.com/vgAkJLsR