So I bought some 10GB Mellanox cards on eBay recently in order to add some cheap 10GB capabilities to my test cluster. Here is my story so I can hopefully save you some headaches.
So my two machines, ClusterF1 and ClusterF2 booted up with the cards fine, however one of the two ports showed as could not start in device manager and would not work. A quick google suggested using the proper driver/firmware update would remedy the problem. Now the cards I bought were OEM IBM cards, so I ended up having to goto their site to get the driver/firmware update. All is right in the world right? WRONG….
So, after downloading and running the installer on ClusterF1 (I was RDP’d in) I lost my network connection, thinking that it was just doing something with the networking stack, I mistakenly started it on ClusterF2 as well. Later, using the iDrac console I discovered that the installer had completely F’d up my 2012R2 fresh install on both machines. I was getting blue screens of death and it would not boot back up 😦
In digging into things, I seemed like my best bet was going to be to get some non-OEM firmware installed on the card… so here we go again. I now had my 3rd cluster member to play with while re-installing the OS on the other two.
The cards I bought were labeled as:
Here is the fix:
- Download the Mellanox firmware tools here: http://www.mellanox.com/page/management_tools
- After installing, goto the folder where the tools installed and run:
- MST Status
- Get the PCI Device ID of the card and use it to run this command:
- flint -d <pci device ID> query
- Note the PSID (mine was IBMxxxxxxxxxxxxx as I had an oem card and most importantly look for what kind of card it is, mine was a MT26448
- Dig through the non-oem Mellanox firmware on their site and find the proper firmware for your card, download it, extract it and place it in the firmware tools folder (firmware for my card was fw-ConnectX2-rel-2_9_1200-MNPH29C-XTR_A2-A5-FlexBoot-3.3.400.bin)
- Run this command: flint -d <PCI Device ID> –allow_psid_change -i
- In my case it asked me if I wanted to overwrite the boot rom, I said yes, and then it warned I was changing the PSID, I confirmed that as well.
- It takes a reboot for the card to load the new firmware. I did this, then installed the 4.8 driver (in their archive section) for Server 2012R2 and all is now right with the world. In IO tests with Jumbo Frames enabled I’m hitting 9.55Gb/s with some cheap Amazon DAC cables from host to host.
- Hopefully this helps someone else, good luck!