'Linux' 카테고리의 글 목록 (2 Page)

Linux

tput: no value for $term 에러 발생 2017.11.21
link error, undefined symbol 2017.03.28
open 되어있는 fd 확인하기. 2017.03.03
프로세스의 메모리 사용량 2017.01.11
rpm 사용법 2014.02.13
Linux 커널 드라이버 모형: 협업의 장점 - in Beautiful code 2010.10.22
Linux kernel : NMI 감시기 2009.12.31
kernel에서 user mode 로 정보 전달 방법 2009.12.30

tput: no value for $term 에러 발생

하늘을 나는 미카 2017. 11. 21. 12:12

2017. 11. 21. 12:12

linux 설치하고 이것저것 하다보면, 당혹스러운 error들을 만나게 됩니다.

tput: no value for $TERM

이 에러는 어디선가 사용하고 있는 TERM이라는 환경변수가 선언 안되어있어서 생기는 문제입니다.

저의 경우는 /etc/profile.d/vte.sh 에서 사용하고 있었는데, 제가 S/W를 설치, 삭제 하다가 환경이 틀어진 모양입니다.

etc/profile.d 에

term.sh 를 만들고 거기에서 export를 시켰습니다.

export TERM=xterm

당연히 xterm이 설치 되어있는 것이 좋겠죠?

저작자표시 변경금지 (새창열림)

'Linux' 카테고리의 다른 글

Ubuntu 에서 숫자키패드( number pad)로 방향키 사용하기 (0)	2020.08.26
relro,Stack Canary 방어 (0)	2018.06.05
link error, undefined symbol (0)	2017.03.28
open 되어있는 fd 확인하기. (0)	2017.03.03
프로세스의 메모리 사용량 (0)	2017.01.11

link error, undefined symbol

하늘을 나는 미카 2017. 3. 28. 14:06

2017. 3. 28. 14:06

undefined symbol error는 흔히 어떤 함수를 사용하려고 하는데,

실제 함수의 구현체가 없는 경우에 linking 하는 과정에서 발생하는 에러입니다.

개발자가 보기에는 에러가 발생안해야 하는데, 이상하게 발생한다고 느껴지는 경우들이 종종 있습니다.

몇가지 경우를 살펴봅시다.

1) header file에만 해당 함수가 정의되어있고 구현부가 없는 경우,

만약 바이너리(binary)로 되어있는 lib과 h 파일을 사용하는 경우, lib안에 해당 함수가 구현이 안되어있을 것입니다.

2) binary에도 구현이 되어있는데, 안되는 경우,

binary 가 so또는 dll 과 같이 shared lib으로 만들어졌을때에, 해당 함수가 export안되어 있어서, 외부에서는 사용할 수 없는 경우입니다.

3) 구현도 되어있고 export 되어있는 경우,

c 로 작성된 함수여서 cpp 파일에서 사용하려고 할때 naming 이 mangling 되어 문제가 발생할 수 있습니다.

ex) void abcd(int a) 라는 함수가 c로 작성된 API명은 _abcd 라는 symbol을 제공하는데,

cpp에서 include 해서 사용하려다 보니 _abcdi 와 같은 전혀 다른 이름으로 호출이 되어 발생할 수 있습니다.

extern "C" 로 해결이 가능한 부분입니다.

원인은 대부분 이 3가지 경우들에 포함될것 입니다.

이 와 관련 내용으로 아래 링크에 따로 정리되어 있습니다.

C++ 상에서 발생하는 name mangling 에 관한 내용

<참고>

undefined symbol error와 관련해서 색다른 사용법과 회피 방법이 있습니다.

프로젝트를 진행하다보면,

build 환경에는 symbol이 없고, 실행 환경에만 library가 있다거나 symbol이 있다거나 하는 경우들이 발생합니다.

이럴 경우, build 시점에 undefined symbol error들이 발생하는데,

이를 회피 하고자 할때 사용하는 방법이

link option에서 undefined symbol을 무시하는 방법이 있습니다.

linux 의 ld 옵션 https://linux.die.net/man/1/ld

ld 옵션 중에 --allow-shlib-undefined 가 있습니다.

link option에 --allow-shlib-undefined 를 주게 되면, undefined symbol이 발생하더라도, build error 를 발생시키지 않고

바이너리를 만들어내게 됩니다.

부득이한 경우에 회피할 수 있는 방법이긴 하나, 전체적으로 보면 이는 좋은 방법은 아닙니다.

이 옵션이 켜진 상태로 프로젝트를 진행하다보면, 실제로 함수구현이 빠진 부분을 발견하지 못하고 넘어갈 수도 있습니다.

그렇게 되면 원인을 찾는데 매우 큰 시간이 들게 됩니다.

저작자표시 변경금지 (새창열림)

'Linux' 카테고리의 다른 글

relro,Stack Canary 방어 (0)	2018.06.05
tput: no value for $term 에러 발생 (0)	2017.11.21
open 되어있는 fd 확인하기. (0)	2017.03.03
프로세스의 메모리 사용량 (0)	2017.01.11
rpm 사용법 (0)	2014.02.13

open 되어있는 fd 확인하기.

하늘을 나는 미카 2017. 3. 3. 12:26

2017. 3. 3. 12:26

코딩이나, 디버깅을 하다보면,

open되어있는 fd들을 확인해보고 싶을 때가 있습니다.

어떤 socket이 열려 있는지, 어떠 파일을 지금 사용하고 있는지 등등 말이죠.

리눅스에서는 process 별로 확인할 수 있는 방법이 있습니다.

/proc/ "pid"/ fd

여기에 들어가서 ls를 해보면, 사용중인 fd들을 확인할 수 있습니다.

저작자표시 변경금지 (새창열림)

'Linux' 카테고리의 다른 글

tput: no value for $term 에러 발생 (0)	2017.11.21
link error, undefined symbol (0)	2017.03.28
프로세스의 메모리 사용량 (0)	2017.01.11
rpm 사용법 (0)	2014.02.13
Linux 커널 드라이버 모형: 협업의 장점 - in Beautiful code (0)	2010.10.22

프로세스의 메모리 사용량

하늘을 나는 미카 2017. 1. 11. 14:52

2017. 1. 11. 14:52

USS(Unique Set Size) : 프로세스만의 고유한 페이지 수. 공유되지 않는 프로세스에 private한 메모리 크기이다.

PSS(Proportional Set Size) : USS + (공유 페이지 / 공유하는 프로세스 수). 즉, 프로세스 고유 메모리 사용량 + 하나의 프로세스가 차지하는 공유 메모리 비율이다. 만약 A프로세스가 6MB 메모리를 사용하고 그 중 2MB가 그 프로세스의 고유 영역이라면, 나머지 4MB는 공유 메모리이다. 4MB의 공유메모리를 4개의 프로세스가 공유하고 있다면 PSS는 2MB + (4MB/4) = 3MB가 된다.

[펌] http://ecogeo.tistory.com/255

저작자표시 변경금지 (새창열림)

'Linux' 카테고리의 다른 글

link error, undefined symbol (0)	2017.03.28
open 되어있는 fd 확인하기. (0)	2017.03.03
rpm 사용법 (0)	2014.02.13
Linux 커널 드라이버 모형: 협업의 장점 - in Beautiful code (0)	2010.10.22
Linux kernel : NMI 감시기 (0)	2009.12.31

rpm 사용법

하늘을 나는 미카 2014. 2. 13. 09:07

2014. 2. 13. 09:07

rpm -qip 패키지.rpm
rpm -Va : 설치되어 있는 rpm 정보
rpm -qf 파일 절대 경로 : 이미 설치되어 있는 rpm의 버젼 정보
rpm -qRp 패키지.rpm : 해당 파일과 의존성을 갖는 파일들 출력
rpm -qip 패키지.rpm : 해당 파일에 대한 파일 목록 표시 spec파일에 쓰인 정보 출력
rpm -qpl 패키지.rpm : 설치할 파일의 목록 표시
rpm2cpio 패키지.rpm | cpio -i --make-directories -E 패키지
: rpm 패키지를 설치하지 않고 그냥 풀거나, 특정 파일만 골라내고 싶을때는?

참조1] http://kltp.kldp.org/tips/KLTP-KLDP-11.html#ss11.3
참조2] http://kltp.kldp.org/stories.php?story=01/03/03/2932400
참조3] http://man.kldp.org/man/man8/rpm.8.html

[출처]http://coffeenix.net/bbs/viewtopic.php?p=1122&sid=1e91028da92c0e073e06cf120390da84

여기 글에 까페 관리자 Yasu 님이 댓글로 달아 주신 내용입니다.

저에게 매우 유용한 자료라서 이렇게 가져왔습니다.

unrpm

unrpm 으로 packaging 해제 가능

저작자표시 비영리 동일조건 (새창열림)

'Linux' 카테고리의 다른 글

open 되어있는 fd 확인하기. (0)	2017.03.03
프로세스의 메모리 사용량 (0)	2017.01.11
Linux 커널 드라이버 모형: 협업의 장점 - in Beautiful code (0)	2010.10.22
Linux kernel : NMI 감시기 (0)	2009.12.31
kernel에서 user mode 로 정보 전달 방법 (0)	2009.12.30

Linux 커널 드라이버 모형: 협업의 장점 - in Beautiful code

하늘을 나는 미카 2010. 10. 22. 13:24

2010. 10. 22. 13:24

Linux 커널 드라이버 모형: 협업의 장점

그레그 크로아 - 하트먼

리눅스 커널 드라이버 모형은 운영체제가 관리하는 모든 종류의 장치를 포괄하는 하나의 시스템 전역적 트리를 구축하기 위한것이다.

지난 수년 동안, 이를 위해 핵심 자료구조와 코드는 몇안되는 장치들을 다루는 하나의 아주 단순한 시스템으로 시작해서 현실 세계에서 처리할 필요가 있는 모든 종류의 장치를 제어하는 고도로 규모가변적인 시스템으로 발전해 왔다.

Linux 커널이 발전함에 따라 처리해야 하는 주변 장치들의 종류도 점점 더 늘어나게 되었는데, 그 과정에서 커널의 핵심부(Core)는 그렇게 다양한 장치 형식들을 좀 더 쉽게 관리할 수 있는 방식들을 받아들이며 진화해 왔다.

거의 모든 장치는 두 개의 구별되는 부분으로 구성된다. 하나는 운영체제와 장치의 연동 방식(PCI버스,SCSI버스,ISA버스,USB 버스 등을 통한)을 정의하는 물리적 부분이고 또 하나는 사용자가 장치를 적절히 사용할 수 있도록 운영체제가 장치를 사용자에게 제시하는 방법을 정의하는 가상의 부분(키보드, 마우스, 비디오,사운드 등)이다.

2.4 커널 릴리즈들에서는 장치의 물리적인 부분을 코드의 버스 관련 부분이 제어했다. 이 버스 코드는 다양한 과제들을 담당했는데, 각각의 개별 버스 코드는 다른 어떤 버스 코드와도 상호작용하지 않았다.

2001년에 모컬(Pat Mochel)은 Linux 커널의 전완 관리 문제를 해결하던 와중에 개별장치를 적절히 끄거나 켜기 위해서는 커널이 서로 다른 장치들 사이의 연결 관계를 알고 있어야 한다는 점을 깨닫게 되었다. 예를 들어 USB 컨트롤러를 끄기 위해서는 PCI 컨트롤러를 끄기전에 먼저 USB 디스크 드라이브를 꺼야 한다. 그래야 해당 장치에 자료를 적절히 저장할 수 있다. 이러한 문제를 해결하려면 시스템의 모든 장치의 연결관계와 연결 순서를 알 수 있는 하나의 트리가 있어야 한다.

다른 운영체제들은 장치의 이름 식별을 처리하는 작은 데이터베이스를 커널안에 두거나 한장치의 모든 가능한 고유 특성을 장치에 직접 접근하는데 사용할 수 있는 deffs 형식의 파일시스템에 익스포트함으로써 이런 문제를 해결했다. 그러나 Linux의 경우 커널에 데이터베이스를 두는 방식을 받아들일 수 없었다. 또한 Linux 의 devfs 파일시스템 구현에는 잘 알려진, 그리고 교정이 불가능한 race condition(경쟁조건)들이 여럿 존재하는 탓에, 거의 모든 Linux 배포판 들은 그 파일 시스템을 신뢰하지 않았다. devfs 해법은 또한 사용자에게 특정한 명명(naming)정책을 강요했다. 이를 장점으로 받아들인 사람들도 있었지만, 그 정책은 공표된 Linux 장치 명명 표준과 맞지 않을 뿐만 아니라 사용자가 자신이 원하는 다른 명명 정책을 사용하지 못하게 된다는 단점을 가지고 있다.

모컬과 나(그레그 크로아)는 우리의 문제가 Linux 커널 안에 있는 하나의 통합된 드라이버 및 장치 모형을 통해서 해결될 수 있음을 깨닫게 되었다. 그러한 통합 모형이 어떤 새로운 아이디어였던 것은 아니다. 그와 같은 통합 모형을 채용한 운영체제들은 과거에도 존재했다. 단지 ,Linux에서도 그것을 사용할 때가 되었던 것일 뿐이다.

우리에게 필요한 것은 모든 장치의 트리를 생성하는데 사용할 수 있을 뿐만아니라 커널 외부의 사용자(userspace)프로그램도 임의의 장치에 대한 영속적인 이름을 사용자가 원하는 방식으로 처리할 수 있게 만드는 통합 모형이었다.

-- 소박한 시작

우리는 커널의 모든 장치에 대해 하나의 "기반" 클래스로 사용할 device라는 간단한 구조체를 만드는 일부터 시작했다.

초기에는 이 구조체는 다음과 같은 모습이었다.

struct device{

struct list_head node; /*sibling node */

struct list_head children;

struct device * parent;

char name[DEVICE_NAME_SIZE]; /*서술적인 ASCII 문자열*/

char bus_id[BUS_ID_SIZE]; /*부모 버스에서의 위치*/

spinlock_t lock; /*서로다른 두 계층이 동시에 접근하지 못하도록 만드는 락*/

atomic_t refcount; /* 장치가 적절한 시간동안 지속되게 만들기 위한 참조 회수(참조되고 있는 count)*/

struct driver_dir_entry * dir;

struct device_driver * driver; /*이 장치를 할당한 드라이버*/

void * driver_data; /* 이장치의 전용 자료*/

void * platform_data; /* 플렛폼 관련 자료(ex ACPI ,장치 BIOS 자료)*/

u32 current_state; /*현재 작동상태. ASCI의 용어로 말하자면, 이것은 D0-D3

D0는 완전히 켜졌음을 뜨하며, D3은 완전히 꺼졌음을 뜻한다.*/

unsigned char * saved_state; /*저장된 장치 상태*/

};

이 구조체를 생성해서 커널 드라이버 핵심부에 등록할 때마다 해당 장치와 그에 담긴 임의의 서로 다른 특성들을 대표하는 하나의 항목이 가상 파일 시스템에 만들어진다. 이를 통해서 시스템의 모든 장치가 사용자공간에 노출되며, 사용자 공간 프로그램은 이 가상 파일 시스템을 통해서 원하는 장치를 연결한다. 지금은 이러한 가상 파일시스템을 sysfs라고 부르는데, Linux가 깔린 컴퓨터의 /sys/devices 디렉터리에서 볼수 있다. 다음은 몇 가지 PCI와 USB장치들에 해당하는 부분이다.

/*원래는 여기에 linux의 device 내용들이 표시되어야 하는데. 타이핑 하기엔 너무 복잡해서 생략!!*/

struct usb_interface {

struct usb_interface_descriptor * altsetting;

int act_altsetting; /*활성화된 대체 설정*/

int act_altsetting; /*대체 설정들의 개수*/

int act_altsetting; /*할당된 메모리 총량*/

struct usb_driver * driver; /*드라이버*/

struct device dev; /*장치정보 관련 인터페이스*/

};

드라이버 핵심부는 struct device에 대한 포인터들을 주고받으면서 그 구조체에 있는 기본적인 ,즉 모든 장치에 공통인 필드들에 대한 작업을 수행한다.

포인터가 다양한 기능을 위한 버스 관련 코드로 넘겨질때에는 그것을 담고 있는 실제 구조체 형식으로 변환해야 한다. 이러한 변환을 처리하기 위해, 버스 관련 코드는 메모리 안에서의 포인터의 위치에 기초해서 포인터를 다시 원래의 구조체로 형변환 한다. 이를 담당하는 것이 다음의 메크로함수이다.

#define container_of(ptr,type,member) ({ \

const typeof( (type*)0)->number)*__mptr = (ptr);\

(type *)((char*)__mptr-offsetof(type,member) );})

예로 원래의 구조체의 struct device 멤버에 대한 포인터를 앞에 나온 struct usb_interface ㅍ포인터로 변환하는 코드를 보자.

int probe(struct device * d){

struct usb_interface * intf;

intf = container_of(d,strcut usb_interface , dev);

}

container_of 메크로와 같은 간단한 방법 덕분에 Linux 커널은 보통의 C구조체들을 아주 강력한 방식으로 상속하고 조작할 수 있게 되었다. 물론 그런 강력함은 개발자가 이들의 작동 방식을 제대로 알고 사용한다는 가정하에서의 이야기 이다.

애초에 struct device로 전달된 포인터가 실제로 struct usb_interface 형식인지를 실행시점에서 점검하지 않는다는 데에 의문을 표하는 독자도 있을 것이다.

전통적으로 , 이런 종류의 포인터 조작을 수행하는 시스템들은 대부분 기반 구조체에 조작 중인 포인터의 형식을 정의하는 하나의 필드를 두고 그것을 이용해서 프로그래머가 형식을 잘못 지정하는 실수를 검출한다. 또한 그러한 필드는 실행 시점에서 포인터의 형식을 동적으로 결정하고 그에 따라다른 일을 수행하는 코드를 작성할 때에도 요긴하게 쓰인다.

그러나 Linux 커널 개발자들은 그런 종류의 점검이나 형식 정의를 빼기로 결정했다. 그런 점검들의 경우 개발 초기의 기본적인 프로그래밍 실수를 잡아내는데 도움이 되기도 하지만, 나중에는 쉽게 잡아내기가 힘든 훨씬 더 미묘한 문제들로 이어질 수 있는 어떤 교묘한 편법들을 가능하게 만드는 수단으로도 악용 될 수도 있기 때문이다.

실행시점 형식 점검이 없기 때문에, 이런 포인터들을 다루는 개발자들은 자신이 다루고 넘겨주는 포인터의 형식을 반드시 명확하게 알고 있어야 한다. 물론 가끔은 자신이 보고있는 struct device의 구체적인 형식을 알아낼 수 있는 어떤 수단이 있으면 좋겠다는 생각도 들겠지만, 그런 생각은 문제를 적절히 디버깅하고 나면 사라진다.

형식 점검의 부재가 코드의 아름다움에 기여할 수 있을까? 이부분을 오년 이상 다뤄 온 입장에서 이야기하자먄, 답은"그렇다"이다.

이런 부재는 커널 내에서 종종 나타나는 손쉬운 편법을 방지하며, 모두가 자신의 로직을 아주 정확하게 짜도록 강제한다. 개발자들이 형식 점검에 의존하지 않도록 강제함으로써 버그를 미연에 방지할 수 있는 것이다.

여기서 나는 커널의 이 부분을 다루는 개발자들(공통 버스들을 위한 하위 시스템을 코딩하는 사람들)이 비교적 그 수가 적고 또 상당한 수준의 전문 지식 및 경험을 갖추었을 것으로 기대할 수 있다는 점을 지적해두고 싶다. 형식 점검이라는 안전장치를 두지 않는 것도 바로 이러한 이유에서이다.

기반 struct device구조체를 이런식으로 상속하는 방법 덕분에, 2.5 커널 개발 공정에서 우리는 서로 다른 모든 드라이버 하위 시스템들을 하나로 통합 할 수 있었다. 드라이버 시스템들은 이제 모두 공통의 핵심코드를 공유하며, 이에 의해 커널은 사용자에게 장치들이 어떻게 연결되어있는지를 제시할 수 있게 되었다.

또한 작은 사용자 공간 프로그램에서 영속적인 장치명명을 수행하는 udev 같은도구나 장치들의 트리를 훑으면서 장치들을 적절한 순서로 끄는 전원관리 모듈을 작성하는 것도 가능해 졌다.

-- 더욱 작은 조각들로 줄이기

초기의 드라이버 핵심부를 다시 만드는 도중에, 또 다른 커널 개발자인 바이로(Al Viro) 는 가상 파일시스템 계층에서의 객체 참조 횟수 집계와 관련된 몇 가지 문제점들을 고치고 있었다.

C언어로 작성된 다중 스레드 프로그램에서 구조체들에 생기는 주된 문제는, 한 구조체가 사용하는 메모리를 안전하게 해제 할 수 있는 시점을 결정하는 것이 아주 어렵다는 것이다. Linu 커널은 부주의한 사용자들 뿐만 아니라 동시에 실행되는 많은 수의 프로세서들도 적절히 처리해야 하는 하나의 대규모 다중 스레드 프로그램이다. 이 때문에 둘 이상의 스레드들이 사용하는 임의의 구조체에 대한 참조 횟수 관리가 필수적이다.

struct device 구조체도 그런 참조 횟수 관리 대상 중 하나였다. 이구조체에는 구조체를 해제해도 안전한지를 결정하는 데 쓰이는 다음과 같은 필드가 있다.

atomic_t refcount;

구조체가 처음 초기화 될 때 이 필드는 1로 설정된다. 이 구조체를 사용하고자 하는 코드는 우선 get_device 함수를 호출해서 참조 횟수를 증가해야 한다. 그 함수는 기존 참조 횟수가 유효한지 점검한 후 그것을 하나 증가한다.

static inline void get_device(struct device * dev)

{

BUG_ON(!atomic_read(&dev->refcount));

atomic_inc(&dev->refcount);

}

비슷하게 , 한 스레드가 구조체를 다시 사용했다면 put_device 함수를 호출해서 해당 참조 횟수를 하나 감소한다. 이 함수는 좀 더 복잡하다.

void put_device(struct device * dev)

{

if ( !atomic_dec_and_lock(&dev->refcount,&device_lock))

return;

드라이버에게 구조체를 해제하게 한다.

우리가 장치를 할당 하지 않았을 가능성이 크므로,

지금이 드라이버가 장치를 해제할 기회이다...

if (dev->driver && dev->driver->remove)

dev->driver->remove(dev,REMOVE_FREE_RESOURCES);

}

이 함수는 해당 구조체의 참조 횟수를 감소한다. 만일 그것이 0이면 구조체가 더 이상 쓰이지 않는 것이므로 드라이버에게 해당 구조체를 정리하게 한다( 그런용도로 시스템이 설정해 둔 함수를 호출해서).

바이로는 struct device 구조체의 통합과 서로 다른 모든 장치들을 보여주는 가상 파일시스템, 그것들이 서로 연결되는 방식, 그리고 자동적인 참조 집계 방싯을 맘에 들어 했다. 문제는 그의 가상 파일시스템 핵심부가 "장치"를 다루는 것이 아니며 이 객체들에 부착할 수 있는 "드라이버"도 없다는 데 있었다. 그래서 그는 코드를 조금 리팩토링해서 좀 더 단순하게 만들기로 결정했다.

바이로는 모컬을 설득해서 struct kobject라는 것을 만들게 했는데, 이 구조체는 struct device 구조체의 기본 속성들을 소유하되 더 작았고, 또 "드라이버"와 "장치" 관계는 가지지 않았다. 이 구조체의 필드들은 다음과 같다.

struct kobject {

char name[KOBJ_NAME_LEN];

atomic_t refcount;

struct list_head entry;

struct kobject *parent;

struct subsystem *subsys;

struct dentry *dentry;

};

이 구조체는 일종의 빈(empty)객체로, 참조 집계 및 객체들의 계통구조로의 삽입을 위한 아주 기본적인 기능만을 가지고 있다.

struct device 구조체는 struct kobject라는 이 더 작은 "기반" 구조체를 자신의 필드로 둠으로써 이 기반 구조체의 모든 기능성을 상속한다.

struct device {

struct list_head g_list;

struct list_head node;

struct list_head bus_list;

struct list_head driver_list;

struct list_head children;

struct list_head intf_list;

struct device *parent;

struct kobject kobj;

char bus_id[BUS_ID_SIZE];

};

kobject 에서 struct device로의 형변환에는 다음과 같은 매크로가 사용된다. 앞에서 나왔던 container_of 매크로를 사용하는 것일 뿐이다.

#define to_dev(obj) container_of(obj,struct device,kobj)

이러한 개발 과정에서, 다른 여러 사람들은 동일한 시스템 이미지에서 더욱 많은 프로세서들이 실행될 수 있도록 규모가변성을 보장하기 위해 Linux 커널의 안정성을 개선하는 작업을 진행했다. 이 때문에, 여러 개발자들이 자신의 메모리 사용을 적절히 처리하기 위해 자신의 구조체에 참조 횟수를 추가하게 되었고, 각 개발자들은 구조체의 초기화, 참조 횟수 증가, 감소, 마무리를 위한 코드를 복제해야 했다.

그래서 struct kobject 의 간단한 기능을 뽑아 개별적인 구조체를 만들기로 했는데, 그 결과가 바로 struct kref 구조체이다.

struct kref

{

atomic_t refcount;

};

struct kref는 단 세개의 단순 함수만을 가진다. kref_init는 참조 횟수를 초기화하고, kref_get은 참조 횟수를 증가하며, kref_put 은 참조 횟수를 감소한다. 처음 두 함수는 아주 간단하다. 살펴볼만한 것은 마지막 함수 뿐이다.

int kref_put(struct kref * kref, void (*release)(struct kref * kref))

{

WARN_ON(release == NULL);

WARN_ON(release == (void (*)(struct kref *))kfree);

if(atomic_dec_and_test(&kref->refcount)){

release(kref);

return 1;

}

return 0;

}

kref_put 함수는 두 개의 매개변수를 받는다. 하나는 참조 횟수를 감소할 struct kref를 가리키는 포인터이고 또 하나는 객체가 더이상 참조되지 않을 경우에 호출될 해제 함수를 가리키는 포인터이다.

함수의 처음 두 줄은 struct kref가 커널에 처음 추가될 당시에는 없었는데, 의도적으로 참조 횟수 관리를 피하려 드는 프로그래머들 때문에 추가한 것이다. 그들은 해제 함수에 대한 포인터를 아예 제공하지 않거나, 그에 대한 커널이 불평을 하면 기본 kfree 함수에 대한 포인터를 제공함으로써 참조 횟수 관리를 피하려 했다.

인수들이 이 두 점검들을 통과했다면 kref_put 함수는 참조 횟수를 원자적으로 감소한다. 그리고 이것이 객체에 대한 마지막 참조였다면, 주어진 해제 함수를 호출하고 1을 돌려준다. 마지막 참조가 아니었다면 0을 돌려준다ㅏ. 이 반환값은 호출자가 객체를 마지막으로 참조했던 것이 었는지의 여부를 알려줄 뿐, 객체가 여전히 메모리 안에 존재하는지를 알려주지는 않는다(호출이 반환된 이후에 다른 누군가가 객체를 해제할 수도 있으므로, 이 반환값으로 객체의 존재 여부를 보장할 수 없다).

struct kref가 도입되면서 struct kobject는 kref를 사용하도록 바뀌었다.

struct kobject{

char name[KOBJ_NAME_LEN];

struct kref kref;

}

이렇게 한 구조체 안에 다른 구조체를 집어넣는 방식을 적용한 결과, 원래의 struct usb_interface는 이제 하나의 struct device를 담게 되었고, 그것은 다시 struct kobject를 그것은 다시 struct kref를 담게 되었다.

-- 장치 수천개로의 규모 확장

Linux는 휴대 전화, 무선조종 헬리콥터, 데스크톱, 서버는 물론 세계 최대 슈퍼 컴퓨터들의 73퍼센트에 이르기까지 거의 모든 플랫폼에서 실행된다. 그런 만큼 드라이버 모형의 규모가변성은 매우 중요한 사안이었으며, 항상 우리의 주된 관심사였다. 개발이 진행됨에 따라, 장치들을 담는데 쓰이는 구조체들, 즉 struct kobject와 struct device 가 비교적 작다는 점이 여러모로 도움이 되었다.

대부분의 시스템들에서 시스템에 연결된 장치 개수는 그 시스템의 크기와 정비례한다.

작은 임베디드 시스템의 경우에는 시스템에 연결된( 그리고 트리에 존재하는) 장치들이 하나에서 열 개 정도이다. 좀 더 큰 " 전사(enterprise)"시스템에는 그보다 훨씬 많은 장치들이 연결되지만, 그런 시스템들은 메모리도 넉넉하기 때문에,늘어난 장치들의 메모리 사용량은 여전히 커널의 전반적인 메모리 사용량의 아주 적은 부분을 차지한다.

그러나 "전사적" 시스템들 중에는 이런 느긋한 규모가변 모형이 완전히 깨지는 경우가 하나 있다. 바로 s390 메인프레임 컴퓨터이다. 이 컴퓨터의 경우 Linux가 하나의 가상 파티션에서 실행된다. 한 컴퓨터에서 최대 1,024개의 Linux 인스턴스들이 동시에 실행될 수 있고, 각 인스턴스에는 엄청나게 많은 개수의 서로 다른 저장 장치들이 연결된다. 전체적으로는 시스템의 메모리가 넉넉하지만 각 가상 파티션에는 그 메모리의 아주 작은 부분만 배정된다. 각 가상 파티션에 할당되는 메모리는 RAM 수백 메가 정도이나, 그래도 각 파티션은 서로 다른 저장 장치 모두(일반적으로 20,000개 정도)를 인식해야 한다.

이러한 시스템들에서는 장치 트리가 메모리의 상당 부분을 차지하게 되고, 그런 메모리는 결코 사용자 프로세스들에게 주어지지 않는다. 드라이버 모형의 살을 좀 빼야 할 때가 된 셈이었는데, 이 문제의 해결에 IBM의 똑똑한 커널 개발자 여러 명이 달라붙었다.

그 와중에 개발자들은 다소 놀라운 사실을 발견하게 되었다. 주된 struct device 구조체는 단 160 바이트 정도였고(32bit 프로세서의 경우), 따라서 시스템에 장치가 20,000개라고 해도 그 구조체들이 차지하는 메모리는 3,4MB밖에 되지 않았다. 이 정도는 충분히 감당할 수있는 수준이다. 실제로 메모리를 크게 잡아먹는 것은 앞에서 언급한, 모든 장치를 사용자공간에 노출하기 위한 RAM 기반 파일시스템 sysfs였다. 장치들 각각에 대해 sysfs는 하나의 struct inode와 하나의 struct dentry를 생성한다. 이들은 모두 상당히 뚱뚱한 구조체들로, struct inode는 약 256바이트 struct dentry는 약 140바이트이다.

각 struct device마다 적어도 하나의 struct dentry와 struct inode가 생성되었다. 전반적으로 이런 파일시스템 구조체들의 수많은 복사본들이 생성된다.

한 예로, 하나의 블록 장치는 약 10개의 서로다른 가상 파일들을 생성하며, 따라서 160바이트짜리 구조체 하나가 무려 4KB를 자치할 수도 있는 것이다. 장치가 20,000개인 시스템의 경우 가상 파일시스템이 약 80MB의 공간을 차지했다. 이 메모리는 커널이 소비하는 것이므로 어떠한 사용자 프로그램도 사용할 수 없었다. sysfs에 저장된 정보를 전혀 요구하지 않는 프로그램도 많다는 점을 감안한다면 이는 심각한 낭비였다.

이에 대한 해결책은 struct inode와 struct dentry 구조체들을 커널의 캐시에 넣되, 파일시스템 접근이 일어날 때마다 즉석에서 그것들을 생성하도록 sysfs의 코드를 재작성하는 것이었다. 구체적으로는, 장치가 처음 생성될 때 모든 것을 미리 할당해 두는 대신, 사용자가 트리를운행함에 따라 동적으로 디렉터리와 파일을 생성하는 방식이다. 이 구조체들은 커널의 주 캐시메모리 요구를 만족시킬 수 없으면 시스템은 캐시를 비워서 메모리를 확보하고 확보된 메모리를 애초에 메모리를 요구한 곳에 제공할 수 있다. 모든 변화는 sysfs의 백엔드 코드에서 일어난것으로, 주 struct device 구조체는 전혀 변하지 않았다.

-- 느슨하게 결합된 작은 객체들

Linux 드라이버 모형은 C언어를 이용해서 각자 한 가지 일에 특화된 여러개의 작은 객체들을 생성함으로써 고도의 객체지향적 코드 모형을 만드는 예를 보여준다. 이 객체들은 다른 객체들에 내장될 수 있으며,이를 통해서 아주 강력하고 유연한 객체 트리를 만들 수 있다. 이 객체들의 실제 사용량에 근거할때, 이들의 메모리 사용량은 최소한도이다. 이 덕분에, 동일한 코드 기반이 아주 작은 임베디드 시스템에서부터 세계 최대의 슈퍼컴퓨터들에 이르기까지 다양한 규모의 플랫폼에서 실행될 수 있을 정도로 Linux커널이 유연해 졌다.

이 모형의 개발은 또한 Linux 커널 개발이 진행되는 방식의 아주 흥미롭고도 강력한 두가지 측면을 보여준다.

우선, 공정이 아주 반복적(iterative)이다. 커널의 요구사항이 변함에 따라, 그리고 그 커널을 기반으로 작동하는 시스템의 요구사항이 변함에 따라, 개발자들은 모형에 좀 더 효율적으로 만들어야 하는 부분을 식별하고 추상화하는 방법을 발견할 수 있었다. 이는 변화하는 환경에서 시스템이 살아남는데 필요한 기본적인 진화적 요구에 대한 응답이었다.

두번째로 , 장치 처리의 역사는 그 공정이 극도로 협동적임을 보여준다. 서로 다른 개발자들이 커널의 여러 측면을 개선하고 확장하기 위한 다양한 아이디어를 제시한다. 그 외의 사람들은 소스 코드를 통해서 그 개발자들의 의도를 그들이 서술한 바 그대로 인식할 수 있으며, 원래의 개발자들이 결코 고려하지 않았던 방식으로 코드를 변화시키는 데 도움을 주게 된다. 이러한 협동작업의 결과로, 개발자가 따로 일하는 경우에는 결코 발견할 수 없었을 하나의 공통된 해결책을 얻어냄으로써 서로 다른 여러 개발자들의 목표가 달성된다.

커널 드라이버 모형 개발의 이러한 두 가지 특성은 Linux가 지금까지 만들어진 운영체제들 중 가장 유연하고 강력한 운영체제로 발전하는 데 도움을 주었다. 이런 특성들이 유지되는 한, Linux는 여전히 그러한 운영체제의 자리를 차지하게 될 것이다.

----이 글을 읽고 나서

독후감(?) 같은 느낌으로 마무리를 하게 되는 군요.

처음 책의 내용을 이렇게 글로 옮기게 된것은, 그냥 훓어보면서 도대체 이야기하고자 하는 바가 잘 안와닿아서,

차분히 읽을겸 옮겨적게 되었습니다..

우선 내가 느낀점은 이글은 완성된 드라이버 모델의 장점에 중심을 두기 보다는,

개발자들이 협업하면서 드라이버 모델이 발전해가는 과정에 중심을 두고 있다는 것이 개발자로서 와 닫는 부분이었습니다.

거의 모든 책들이 사용자나 개발 결과물에 대한 내용을 주로 다룬 것에 반해 개발자의 노력으로 개선되어가는 플랫폼을 보여주었으며, 뭐 크게 쇼킹하다고 생각되는 기술적인 내용은 아니지만, 현업에서 소소한것 하나하나들을 좀 살펴보고 개선의 여지가 있는 것들을 찾아보려는 노력을 게을리 하면 안되겟다는 생각을 다지게 되는 군요.

학생때 즐기면서 코딩을 시작했던 개발자 분들 !!!

그래도 다른 일 하는 것 보다는 즐겁잖아요.!! 즐기면서 합시다.!!!

저작자표시 (새창열림)

'Linux' 카테고리의 다른 글

프로세스의 메모리 사용량 (0)	2017.01.11
rpm 사용법 (0)	2014.02.13
Linux kernel : NMI 감시기 (0)	2009.12.31
kernel에서 user mode 로 정보 전달 방법 (0)	2009.12.30
[Windowing System] Linux X Server (0)	2009.08.27

Linux kernel : NMI 감시기

하늘을 나는 미카 2009. 12. 31. 16:36

2009. 12. 31. 16:36

NMI 감시기 검사

멀티 프로세서 시스템에서 커널 개발자를 위한 Watchdog Sytem

이 watchdog system은 시스템을 멈추게 하는 커널 버그를 감지하는데 유용하게 사용된다.

이 감시기는 모든 CPU에 주기적으로 NMI 인터럽트를 발생시키는 지역 입출력 APIC의 똑똑한 하드웨어 특징에 기초한다.
cli 어셈블러 명령어로 NMI 인터럽트를 금지할 수 없으므로 인터럽트를 금지한 경우라도 감시기는 데드락을 발견할 수 있다.

슬랩 할당자

메모리 영역을 일련의 자료구조와 생성자 소멸자 라는 메소드를 포함한 객체로 바라본다.
생성자는 메모리 영역을 초기화 하고, 소멸자는 나머지에 대한 정리를 한다.
슬랩 할당자는 객체를 반복해서 초기화 하지 않도록 할당하다가 해지한 객체를 폐기하지 않고 메모리를 그대로저장한다.
새로운 객체를 요청하면 초기화를 다시하지않고 메모리에서 이런 객체를 가져올 수 있다.

커널함수는 같은 유형의 메모리 영역을 반복해서 요청하는 경향이 있다.
프로세스 디스크립터, 열린파일 객체 크기같은 크기가 고정된 table용메모리 영역 등이다.

저작자표시 (새창열림)

'Linux' 카테고리의 다른 글

프로세스의 메모리 사용량 (0)	2017.01.11
rpm 사용법 (0)	2014.02.13
Linux 커널 드라이버 모형: 협업의 장점 - in Beautiful code (0)	2010.10.22
kernel에서 user mode 로 정보 전달 방법 (0)	2009.12.30
[Windowing System] Linux X Server (0)	2009.08.27

kernel에서 user mode 로 정보 전달 방법

하늘을 나는 미카 2009. 12. 30. 09:59

2009. 12. 30. 09:59

Kernel Korner - Why and How to Use Netlink Socket

January 5th, 2005 by Kevin Kaichuan He in

SysAdmin

Use this bidirectional, versatile method to pass data between kernel and user space.

Due to the complexity of developing and maintaining the kernel, only the most essential and performance-critical code are placed in the kernel. Other things, such as GUI, management and control code, typically are programmed as user-space applications. This practice of splitting the implementation of certain features between kernel and user space is quite common in Linux. Now the question is how can kernel code and user-space code communicate with each other?

The answer is the various IPC methods that exist between kernel and user space, such as system call, ioctl, proc filesystem or netlink socket. This article discusses netlink socket and reveals its advantages as a network feature-friendly IPC.

Introduction

Netlink socket is a special IPC used for transferring information between kernel and user-space processes. It provides a full-duplex communication link between the two by way of standard socket APIs for user-space processes and a special kernel API for kernel modules. Netlink socket uses the address family AF_NETLINK, as compared to AF_INET used by TCP/IP socket. Each netlink socket feature defines its own protocol type in the kernel header file include/linux/netlink.h.

The following is a subset of features and their protocol types currently supported by the netlink socket:

NETLINK_ROUTE: communication channel between user-space routing dæmons, such as BGP, OSPF, RIP and kernel packet forwarding module. User-space routing dæmons update the kernel routing table through this netlink protocol type.
NETLINK_FIREWALL: receives packets sent by the IPv4 firewall code.
NETLINK_NFLOG: communication channel for the user-space iptable management tool and kernel-space Netfilter module.
NETLINK_ARPD: for managing the arp table from user space.

Why do the above features use netlink instead of system calls, ioctls or proc filesystems for communication between user and kernel worlds? It is a nontrivial task to add system calls, ioctls or proc files for new features; we risk polluting the kernel and damaging the stability of the system. Netlink socket is simple, though: only a constant, the protocol type, needs to be added to netlink.h. Then, the kernel module and application can talk using socket-style APIs immediately.

Netlink is asynchronous because, as with any other socket API, it provides a socket queue to smooth the burst of messages. The system call for sending a netlink message queues the message to the receiver's netlink queue and then invokes the receiver's reception handler. The receiver, within the reception handler's context, can decide whether to process the message immediately or leave the message in the queue and process it later in a different context. Unlike netlink, system calls require synchronous processing. Therefore, if we use a system call to pass a message from user space to the kernel, the kernel scheduling granularity may be affected if the time to process that message is long.

The code implementing a system call in the kernel is linked statically to the kernel in compilation time; thus, it is not appropriate to include system call code in a loadable module, which is the case for most device drivers. With netlink socket, no compilation time dependency exists between the netlink core of Linux kernel and the netlink application living in loadable kernel modules.

Netlink socket supports multicast, which is another benefit over system calls, ioctls and proc. One process can multicast a message to a netlink group address, and any number of other processes can listen to that group address. This provides a near-perfect mechanism for event distribution from kernel to user space.

System call and ioctl are simplex IPCs in the sense that a session for these IPCs can be initiated only by user-space applications. But, what if a kernel module has an urgent message for a user-space application? There is no way of doing that directly using these IPCs. Normally, applications periodically need to poll the kernel to get the state changes, although intensive polling is expensive. Netlink solves this problem gracefully by allowing the kernel to initiate sessions too. We call it the duplex characteristic of the netlink socket.

Finally, netlink socket provides a BSD socket-style API that is well understood by the software development community. Therefore, training costs are less as compared to using the rather cryptic system call APIs and ioctls.

Relating to the BSD Routing Socket

In BSD TCP/IP stack implementation, there is a special socket called the routing socket. It has an address family of AF_ROUTE, a protocol family of PF_ROUTE and a socket type of SOCK_RAW. The routing socket in BSD is used by processes to add or delete routes in the kernel routing table.

In Linux, the equivalent function of the routing socket is provided by the netlink socket protocol type NETLINK_ROUTE. Netlink socket provides a functionality superset of BSD's routing socket.

Netlink Socket APIs

The standard socket APIs—socket(), sendmsg(), recvmsg() and close()—can be used by user-space applications to access netlink socket. Consult the man pages for detailed definitions of these APIs. Here, we discuss how to choose parameters for these APIs only in the context of netlink socket. The APIs should be familiar to anyone who has written an ordinary network application using TCP/IP sockets.

To create a socket with socket(), enter:

int socket(int domain, int type, int protocol)

The socket domain (address family) is AF_NETLINK, and the type of socket is either SOCK_RAW or SOCK_DGRAM, because netlink is a datagram-oriented service.

The protocol (protocol type) selects for which netlink feature the socket is used. The following are some predefined netlink protocol types: NETLINK_ROUTE, NETLINK_FIREWALL, NETLINK_ARPD, NETLINK_ROUTE6 and NETLINK_IP6_FW. You also can add your own netlink protocol type easily.

Up to 32 multicast groups can be defined for each netlink protocol type. Each multicast group is represented by a bit mask, 1<<i, where 0<=i<=31. This is extremely useful when a group of processes and the kernel process coordinate to implement the same feature—sending multicast netlink messages can reduce the number of system calls used and alleviate applications from the burden of maintaining the multicast group membership.

bind()

As for a TCP/IP socket, the netlink bind() API associates a local (source) socket address with the opened socket. The netlink address structure is as follows:

struct sockaddr_nl

{

  sa_family_t    nl_family;  /* AF_NETLINK   */

  unsigned short nl_pad;     /* zero         */

  __u32          nl_pid;     /* process pid */

  __u32          nl_groups;  /* mcast groups mask */

} nladdr;

When used with bind(), the nl_pid field of the sockaddr_nl can be filled with the calling process' own pid. The nl_pid serves here as the local address of this netlink socket. The application is responsible for picking a unique 32-bit integer to fill in nl_pid:

NL_PID Formula 1:  nl_pid = getpid();

Formula 1 uses the process ID of the application as nl_pid, which is a natural choice if, for the given netlink protocol type, only one netlink socket is needed for the process.

In scenarios where different threads of the same process want to have different netlink sockets opened under the same netlink protocol, Formula 2 can be used to generate the nl_pid:

NL_PID Formula 2: pthread_self() << 16 | getpid();

In this way, different pthreads of the same process each can have their own netlink socket for the same netlink protocol type. In fact, even within a single pthread it's possible to create multiple netlink sockets for the same protocol type. Developers need to be more creative, however, in generating a unique nl_pid, and we don't consider this to be a normal-use case.

If the application wants to receive netlink messages of the protocol type that are destined for certain multicast groups, the bitmasks of all the interested multicast groups should be ORed together to form the nl_groups field of sockaddr_nl. Otherwise, nl_groups should be zeroed out so the application receives only the unicast netlink message of the protocol type destined for the application. After filling in the nladdr, do the bind as follows:

bind(fd, (struct sockaddr*)&nladdr, sizeof(nladdr));

Sending a Netlink Message

In order to send a netlink message to the kernel or other user-space processes, another struct sockaddr_nl nladdr needs to be supplied as the destination address, the same as sending a UDP packet with sendmsg(). If the message is destined for the kernel, both nl_pid and nl_groups should be supplied with 0.

If the message is a unicast message destined for another process, the nl_pid is the other process' pid and nl_groups is 0, assuming nlpid Formula 1 is used in the system.

If the message is a multicast message destined for one or multiple multicast groups, the bitmasks of all the destination multicast groups should be ORed together to form the nl_groups field. We then can supply the netlink address to the struct msghdr msg for the sendmsg() API, as follows:

struct msghdr msg;

msg.msg_name = (void *)&(nladdr);

msg.msg_namelen = sizeof(nladdr);

The netlink socket requires its own message header as well. This is for providing a common ground for netlink messages of all protocol types.

Because the Linux kernel netlink core assumes the existence of the following header in each netlink message, an application must supply this header in each netlink message it sends:

struct nlmsghdr

{

  __u32 nlmsg_len;   /* Length of message */

  __u16 nlmsg_type;  /* Message type*/

  __u16 nlmsg_flags; /* Additional flags */

  __u32 nlmsg_seq;   /* Sequence number */

  __u32 nlmsg_pid;   /* Sending process PID */

};

nlmsg_len has to be completed with the total length of the netlink message, including the header, and is required by netlink core. nlmsg_type can be used by applications and is an opaque value to netlink core. nlmsg_flags is used to give additional control to a message; it is read and updated by netlink core. nlmsg_seq and nlmsg_pid are used by applications to track the message, and they are opaque to netlink core as well.

A netlink message thus consists of nlmsghdr and the message payload. Once a message has been entered, it enters a buffer pointed to by the nlh pointer. We also can send the message to the struct msghdr msg:

struct iovec iov;


iov.iov_base = (void *)nlh;

iov.iov_len = nlh->nlmsg_len;


msg.msg_iov = &iov;

msg.msg_iovlen = 1;

After the above steps, a call to sendmsg() kicks out the netlink message:

sendmsg(fd, &msg, 0);

Receiving Netlink Messages

A receiving application needs to allocate a buffer large enough to hold netlink message headers and message payloads. It then fills the struct msghdr msg as shown below and uses the standard recvmsg() to receive the netlink message, assuming the buffer is pointed to by nlh:

struct sockaddr_nl nladdr;

struct msghdr msg;

struct iovec iov;


iov.iov_base = (void *)nlh;

iov.iov_len = MAX_NL_MSG_LEN;

msg.msg_name = (void *)&(nladdr);

msg.msg_namelen = sizeof(nladdr);


msg.msg_iov = &iov;

msg.msg_iovlen = 1;

recvmsg(fd, &msg, 0);

After the message has been received correctly, the nlh should point to the header of the just-received netlink message. nladdr should hold the destination address of the received message, which consists of the pid and the multicast groups to which the message is sent. And, the macro NLMSG_DATA(nlh), defined in netlink.h, returns a pointer to the payload of the netlink message. A call to close(fd) closes the netlink socket identified by file descriptor fd.

Kernel-Space Netlink APIs

The kernel-space netlink API is supported by the netlink core in the kernel, net/core/af_netlink.c. From the kernel side, the API is different from the user-space API. The API can be used by kernel modules to access the netlink socket and to communicate with user-space applications. Unless you leverage the existing netlink socket protocol types, you need to add your own protocol type by adding a constant to netlink.h. For example, we can add a netlink protocol type for testing purposes by inserting this line into netlink.h:

#define NETLINK_TEST  17

Afterward, you can reference the added protocol type anywhere in the Linux kernel.

In user space, we call socket() to create a netlink socket, but in kernel space, we call the following API:

struct sock *

netlink_kernel_create(int unit,

           void (*input)(struct sock *sk, int len));

The parameter unit is, in fact, the netlink protocol type, such as NETLINK_TEST. The function pointer, input, is a callback function invoked when a message arrives at this netlink socket.

After the kernel has created a netlink socket for protocol NETLINK_TEST, whenever user space sends a netlink message of the NETLINK_TEST protocol type to the kernel, the callback function, input(), which is registered by netlink_kernel_create(), is invoked. The following is an example implementation of the callback function input:

void input (struct sock *sk, int len)

{

 struct sk_buff *skb;

 struct nlmsghdr *nlh = NULL;

 u8 *payload = NULL;


 while ((skb = skb_dequeue(&sk->receive_queue))

       != NULL) {

 /* process netlink message pointed by skb->data */

 nlh = (struct nlmsghdr *)skb->data;

 payload = NLMSG_DATA(nlh);

 /* process netlink message with header pointed by

  * nlh	and payload pointed by payload

  */

 }

}

This input() function is called in the context of the sendmsg() system call invoked by the sending process. It is okay to process the netlink message inside input() if it's fast. When the processing of netlink message takes a long time, however, we want to keep it out of input() to avoid blocking other system calls from entering the kernel. Instead, we can use a dedicated kernel thread to perform the following steps indefinitely. Use skb = skb_recv_datagram(nl_sk) where nl_sk is the netlink socket returned by netlink_kernel_create(). Then, process the netlink message pointed to by skb->data.

This kernel thread sleeps when there is no netlink message in nl_sk. Thus, inside the callback function input(), we need to wake up only the sleeping kernel thread, like this:

void input (struct sock *sk, int len)

{

  wake_up_interruptible(sk->sleep);

}

This is a more scalable communication model between user space and kernel. It also improves the granularity of context switches.

Sending Netlink Messages from the Kernel

Just as in user space, the source netlink address and destination netlink address need to be set when sending a netlink message. Assuming the socket buffer holding the netlink message to be sent is struct sk_buff *skb, the local address can be set with:

NETLINK_CB(skb).groups = local_groups;

NETLINK_CB(skb).pid = 0;   /* from kernel */

The destination address can be set like this:

NETLINK_CB(skb).dst_groups = dst_groups;

NETLINK_CB(skb).dst_pid = dst_pid;

Such information is not stored in skb->data. Rather, it is stored in the netlink control block of the socket buffer, skb.

To send a unicast message, use:

int

netlink_unicast(struct sock *ssk, struct sk_buff

                *skb, u32 pid, int nonblock);

where ssk is the netlink socket returned by netlink_kernel_create(), skb->data points to the netlink message to be sent and pid is the receiving application's pid, assuming NLPID Formula 1 is used. nonblock indicates whether the API should block when the receiving buffer is unavailable or immediately return a failure.

You also can send a multicast message. The following API delivers a netlink message to both the process specified by pid and the multicast groups specified by group:

void

netlink_broadcast(struct sock *ssk, struct sk_buff

         *skb, u32 pid, u32 group, int allocation);

group is the ORed bitmasks of all the receiving multicast groups. allocation is the kernel memory allocation type. Typically, GFP_ATOMIC is used if from interrupt context; GFP_KERNEL if otherwise. This is due to the fact that the API may need to allocate one or many socket buffers to clone the multicast message.

Closing a Netlink Socket from the Kernel

Given the struct sock *nl_sk returned by netlink_kernel_create(), we can call the following kernel API to close the netlink socket in the kernel:

sock_release(nl_sk->socket);

So far, we have shown only the bare minimum code framework to illustrate the concept of netlink programming. We now will use our NETLINK_TEST netlink protocol type and assume it already has been added to the kernel header file. The kernel module code listed here contains only the netlink-relevant part, so it should be inserted into a complete kernel module skeleton, which you can find from many other reference sources.

Unicast Communication between Kernel and Application

In this example, a user-space process sends a netlink message to the kernel module, and the kernel module echoes the message back to the sending process. Here is the user-space code:

#include <sys/socket.h>

#include <linux/netlink.h>


#define MAX_PAYLOAD 1024  /* maximum payload size*/

struct sockaddr_nl src_addr, dest_addr;

struct nlmsghdr *nlh = NULL;

struct iovec iov;

int sock_fd;


void main() {

 sock_fd = socket(PF_NETLINK, SOCK_RAW,NETLINK_TEST);


 memset(&src_addr, 0, sizeof(src_addr));

 src__addr.nl_family = AF_NETLINK;

 src_addr.nl_pid = getpid();  /* self pid */

 src_addr.nl_groups = 0;  /* not in mcast groups */

 bind(sock_fd, (struct sockaddr*)&src_addr,

      sizeof(src_addr));


 memset(&dest_addr, 0, sizeof(dest_addr));

 dest_addr.nl_family = AF_NETLINK;

 dest_addr.nl_pid = 0;   /* For Linux Kernel */

 dest_addr.nl_groups = 0; /* unicast */


 nlh=(struct nlmsghdr *)malloc(

		         NLMSG_SPACE(MAX_PAYLOAD));

 /* Fill the netlink message header */

 nlh->nlmsg_len = NLMSG_SPACE(MAX_PAYLOAD);

 nlh->nlmsg_pid = getpid();  /* self pid */

 nlh->nlmsg_flags = 0;

 /* Fill in the netlink message payload */

 strcpy(NLMSG_DATA(nlh), "Hello you!");


 iov.iov_base = (void *)nlh;

 iov.iov_len = nlh->nlmsg_len;

 msg.msg_name = (void *)&dest_addr;

 msg.msg_namelen = sizeof(dest_addr);

 msg.msg_iov = &iov;

 msg.msg_iovlen = 1;


 sendmsg(fd, &msg, 0);


 /* Read message from kernel */

 memset(nlh, 0, NLMSG_SPACE(MAX_PAYLOAD));

 recvmsg(fd, &msg, 0);

 printf(" Received message payload: %s\n",

	NLMSG_DATA(nlh));


 /* Close Netlink Socket */

 close(sock_fd);

}

And, here is the kernel code:

struct sock *nl_sk = NULL;


void nl_data_ready (struct sock *sk, int len)

{

  wake_up_interruptible(sk->sleep);

}


void netlink_test() {

 struct sk_buff *skb = NULL;

 struct nlmsghdr *nlh = NULL;

 int err;

 u32 pid;


 nl_sk = netlink_kernel_create(NETLINK_TEST,

                                   nl_data_ready);

 /* wait for message coming down from user-space */

 skb = skb_recv_datagram(nl_sk, 0, 0, &err);


 nlh = (struct nlmsghdr *)skb->data;

 printk("%s: received netlink message payload:%s\n",

        __FUNCTION__, NLMSG_DATA(nlh));


 pid = nlh->nlmsg_pid; /*pid of sending process */

 NETLINK_CB(skb).groups = 0; /* not in mcast group */

 NETLINK_CB(skb).pid = 0;      /* from kernel */

 NETLINK_CB(skb).dst_pid = pid;

 NETLINK_CB(skb).dst_groups = 0;  /* unicast */

 netlink_unicast(nl_sk, skb, pid, MSG_DONTWAIT);

 sock_release(nl_sk->socket);

}

After loading the kernel module that executes the kernel code above, when we run the user-space executable, we should see the following dumped from the user-space program:

Received message payload: Hello you!

And, the following message should appear in the output of dmesg:

netlink_test: received netlink message payload:

Hello you!

Multicast Communication between Kernel and Applications

In this example, two user-space applications are listening to the same netlink multicast group. The kernel module pops up a message through netlink socket to the multicast group, and all the applications receive it. Here is the user-space code:

#include <sys/socket.h>

#include <linux/netlink.h>


#define MAX_PAYLOAD 1024  /* maximum payload size*/

struct sockaddr_nl src_addr, dest_addr;

struct nlmsghdr *nlh = NULL;

struct iovec iov;

int sock_fd;


void main() {

 sock_fd=socket(PF_NETLINK, SOCK_RAW, NETLINK_TEST);


 memset(&src_addr, 0, sizeof(local_addr));

 src_addr.nl_family = AF_NETLINK;

 src_addr.nl_pid = getpid();  /* self pid */

 /* interested in group 1<<0 */

 src_addr.nl_groups = 1;

 bind(sock_fd, (struct sockaddr*)&src_addr,

      sizeof(src_addr));


 memset(&dest_addr, 0, sizeof(dest_addr));


 nlh = (struct nlmsghdr *)malloc(

                          NLMSG_SPACE(MAX_PAYLOAD));

 memset(nlh, 0, NLMSG_SPACE(MAX_PAYLOAD));


 iov.iov_base = (void *)nlh;

 iov.iov_len = NLMSG_SPACE(MAX_PAYLOAD);

 msg.msg_name = (void *)&dest_addr;

 msg.msg_namelen = sizeof(dest_addr);

 msg.msg_iov = &iov;

 msg.msg_iovlen = 1;


 printf("Waiting for message from kernel\n");


 /* Read message from kernel */

 recvmsg(fd, &msg, 0);

 printf(" Received message payload: %s\n",

        NLMSG_DATA(nlh));

 close(sock_fd);

}

And, here is the kernel code:

#define MAX_PAYLOAD 1024

struct sock *nl_sk = NULL;


void netlink_test() {

 sturct sk_buff *skb = NULL;

 struct nlmsghdr *nlh;

 int err;


 nl_sk = netlink_kernel_create(NETLINK_TEST,

                               nl_data_ready);

 skb=alloc_skb(NLMSG_SPACE(MAX_PAYLOAD),GFP_KERNEL);

 nlh = (struct nlmsghdr *)skb->data;

 nlh->nlmsg_len = NLMSG_SPACE(MAX_PAYLOAD);

 nlh->nlmsg_pid = 0;  /* from kernel */

 nlh->nlmsg_flags = 0;

 strcpy(NLMSG_DATA(nlh), "Greeting from kernel!");

 /* sender is in group 1<<0 */

 NETLINK_CB(skb).groups = 1;

 NETLINK_CB(skb).pid = 0;  /* from kernel */

 NETLINK_CB(skb).dst_pid = 0;  /* multicast */

 /* to mcast group 1<<0 */

 NETLINK_CB(skb).dst_groups = 1;


 /*multicast the message to all listening processes*/

 netlink_broadcast(nl_sk, skb, 0, 1, GFP_KERNEL);

 sock_release(nl_sk->socket);

}

Assuming the user-space code is compiled into the executable nl_recv, we can run two instances of nl_recv:

./nl_recv &

Waiting for message from kernel

./nl_recv &

Waiting for message from kernel

Then, after we load the kernel module that executes the kernel-space code, both instances of nl_recv should receive the following message:

Received message payload: Greeting from kernel!

Received message payload: Greeting from kernel!

Conclusion

Netlink socket is a flexible interface for communication between user-space applications and kernel modules. It provides an easy-to-use socket API to both applications and the kernel. It provides advanced communication features, such as full-duplex, buffered I/O, multicast and asynchronous communication, which are absent in other kernel/user-space IPCs.

Kevin Kaichuan He (hek_u5@yahoo.com) is a principal software engineer at Solustek Corp. He currently is working on embedded system, device driver and networking protocols projects. His previous work experience includes senior software engineer at Cisco Systems and research assistant at CS, Purdue University. In his spare time, he enjoys digital photography, PS2 games and literature.

__________________________

Special Magazine Offer -- Free Gift with Subscription
Receive a free digital copy of Linux Journal's System Administration Special Edition as well as instant online access to current and past issues. CLICK HERE for offer

Linux Journal: delivering readers the advice and inspiration they need to get the most out of their Linux systems since 1994.

Help regarding Netlink sockets for 2.6 kernels: Kernel module

On November 25th, 2009 P (not verified) says:

Hi,
I'm a netlink newbie developing a kernel module (as stated above) for 2.6.x kernels.
I'm simply truing to pass a(any) message between the user space and the kernel.

The changes to the netlink APIs from kernel to kernel are confusing.
Below is my kernel code:

#include <linux/module.h>

#include <linux/kernel.h>

#include <linux/init.h>

#include <net/sock.h>

#include <linux/socket.h>

#include <linux/net.h>

#include <asm/types.h>

#include <linux/netlink.h>

#include <linux/skbuff.h>


#define NETLINK_TEST 17

#define VFW_GROUP 0

#define MSG_SIZE NLMSG_SPACE(1024)




static struct sock *nl_sk = NULL;


static void nltest_rcv(struct sock *sk, int len)

{

        struct sk_buff *nl_skb;

        struct nlmsghdr *nl_hdr;

        int pid;

        while ((nl_skb = skb_dequeue(&sk->sk_receive_queue)) != NULL) {

                nl_hdr = (struct nlmsghdr *)nl_skb->data;

                pid = nl_hdr->nlmsg_pid;

                printk(KERN_ALERT "*** Message from user with PID: (pid = %d) is %s\n", pid, (char*)NLMSG_DATA(nl_hdr));

                nl_skb = alloc_skb(MSG_SIZE, in_interrupt() ? GFP_ATOMIC : GFP_KERNEL);

                skb_put(nl_skb, MSG_SIZE);

                nl_hdr = (struct nlmsghdr *)nl_skb->data;

                nl_hdr->nlmsg_len = MSG_SIZE;

                nl_hdr->nlmsg_pid = pid;

                nl_hdr->nlmsg_flags = 0;

                strcpy(NLMSG_DATA(nl_hdr), "HELLO HELLO HELLO");

                NETLINK_CB(nl_skb).pid = 0;

                NETLINK_CB(nl_skb).dst_pid = pid;

                NETLINK_CB(nl_skb).dst_group = VFW_GROUP;

                netlink_unicast(nl_sk, nl_skb, pid, 0);

                kfree_skb(nl_skb);

        }

}




/* LXR reference:http://lxr.linux.no/#linux+v2.6.26.7/net/netlink/af_netlink.c#L1358

struct sock * netlink_kernel_create

(struct net *net, int unit, unsigned int groups,void (*input)(struct sk_buff *skb),struct mutex *cb_mutex, struct module *module)

*/




static int __init nltest_init(void)

{

	struct net *net;

        printk(KERN_ALERT "INIT/START: nltest\n");

        nl_sk = netlink_kernel_create(net,NETLINK_TEST, VFW_GROUP, nltest_rcv, 0, THIS_MODULE);

        if (!nl_sk) {

                printk(KERN_ALERT "ERROR: nltest - netlink_kernel_create() failed\n");

                return -1;

        }

        return 0;

}

static void __exit nltest_exit(void)

{

        printk(KERN_ALERT "EXIT: nltest\n");

        sock_release(nl_sk->sk_socket);


        return;

}

module_init(nltest_init);

module_exit(nltest_exit);


MODULE_DESCRIPTION("Module_Test_Netlink");

MODULE_LICENSE("GPL");

when I run a make on this file, it shows the following errors:

root@ubuntu:/home/p# make

make -C /lib/modules/2.6.27-15-generic/build M=/home/p modules

make[1]: Entering directory `/usr/src/linux-headers-2.6.27-15-generic'

  CC [M]  /home/priyanka/kern.o

/home/p/kern.c: In function ‘nltest_rcv’:

/home/p/kern.c:37: error: ‘struct netlink_skb_parms’ has no member named ‘dst_pid’

/home/p/kern.c: In function ‘nltest_init’:

/home/p/kern.c:54: warning: passing argument 4 of ‘netlink_kernel_create’ from incompatible pointer type

make[2]: *** [/home/p/kern.o] Error 1

make[1]: *** [_module_/home/p] Error 2

make[1]: Leaving directory `/usr/src/linux-headers-2.6.27-15-generic'

make: *** [all] Error 2

The two errors are for
1. netlink_kernel_create function
2. struct netlink_skb_parms

Can anyone help me figure out how to solve these errors?

Thanks!

netlink_kernel_create function example for kernel 2.6.29

On November 12th, 2009 Anonymous (not verified) says:

Can you provide an example for netlink_kernel_create function for centos 5.1 2.6.29 kernel

Need working code for latest kernel

On April 13th, 2009 prashant bhole (not verified) says:

I tried to modify this for latest kernel... but when kernel send msg to user space, kernel hangs after few seconds... I am not able to figure out the problem

// KERNEL MODULE


#include 

#include 

#include 

#include 

#include 


#include 

#include 

#include 

#include 

#include 

#include 


DEFINE_MUTEX(mut);


#define KNETLINK_UNIT 17


static struct sock * knetlink_sk = NULL;

char data_string[] = "Hello Userspace! This is msg from Kernel";


int knetlink_process( struct sk_buff * skb, struct nlmsghdr *nlh )

{

        u8 * payload = NULL;

        int   payload_size;

        int   length;

        int   seq;

        pid_t pid;

        struct sk_buff * rskb;


        pid = nlh->nlmsg_pid;

        length = nlh->nlmsg_len;

        seq = nlh->nlmsg_seq;

        printk("\nknetlink_process: nlmsg len %d type %d pid %d seq %d\n",

                        length, nlh->nlmsg_type, pid, seq );

        /* process the paylad */

        payload_size = nlh->nlmsg_len - NLMSG_LENGTH(0);

        if ( payload_size > 0 ) {

                payload = NLMSG_DATA( nlh );

                printk("\nknetlink_process: Payload is %s ", payload);

        }

        // reply

        rskb = alloc_skb( nlh->nlmsg_len, GFP_KERNEL );

        if ( rskb ) {

                memcpy( rskb->data, skb->data, length );

                skb_put( rskb, length );

                kfree_skb( skb );

        } else {

                printk("knetlink_process: replies with the same socket_buffer\n");

                rskb = skb;

        }

        memset(rskb->data, 0, length);

        nlh = (struct nlmsghdr *) rskb->data;

        nlh->nlmsg_len   = length;

        nlh->nlmsg_pid   = 0; //from kernel

        nlh->nlmsg_flags = NLM_F_REQUEST;

        nlh->nlmsg_type  = 2;

        nlh->nlmsg_seq   = seq+1;

        payload = NLMSG_DATA( nlh );


        printk("knetlink_process: reply nlmsg len %d type %d pid %d\n",

                        nlh->nlmsg_len, nlh->nlmsg_type, nlh->nlmsg_pid );

        strcpy(payload, data_string);

        *(payload + strlen(data_string)) = '\0';

        netlink_unicast( knetlink_sk, rskb, pid, MSG_DONTWAIT );

        return 0;

}


// USER SPACE CODE


#include 

#include 

#include 

#include 

#include 


#define KNETLINK_UNIT   17

#define MAX_PAYLOAD 1024  /* maximum payload size*/

struct sockaddr_nl src_addr, dst_addr;

struct msghdr msg;

struct nlmsghdr *nlh = NULL;

struct iovec iov;

int sock_fd;


char data_string[] = "Hello Kernel! This is user space message";


int main()

{

        sock_fd = socket(PF_NETLINK, SOCK_RAW, KNETLINK_UNIT);

        char *data = NULL;


        memset(&src_addr, 0, sizeof(src_addr));

        src_addr.nl_family = AF_NETLINK;

        src_addr.nl_pid = getpid();

        src_addr.nl_groups = 0; // no multicast

        bind(sock_fd, (struct sockaddr*)&src_addr, sizeof(src_addr));


        memset(&dst_addr, 0, sizeof(dst_addr));

        dst_addr.nl_family = AF_NETLINK;

        dst_addr.nl_pid = 0; // 0 means kernel

        dst_addr.nl_groups = 0; // no multicast


        nlh = (struct nlmsghdr *)malloc(NLMSG_SPACE(MAX_PAYLOAD));


        memset(nlh, 0, NLMSG_SPACE(MAX_PAYLOAD));

        /* Fill the netlink message header */

        nlh->nlmsg_len = NLMSG_SPACE(MAX_PAYLOAD);

        nlh->nlmsg_pid = getpid();

        // I dont know what to set nlmsg_flags and nlmsg_type

        nlh->nlmsg_flags = NLM_F_REQUEST;

        nlh->nlmsg_type = NLMSG_MIN_TYPE +1;




        strcpy(NLMSG_DATA(nlh), data_string);

        *((char*)NLMSG_DATA(nlh) + strlen(data_string)) = '\0';


        iov.iov_base = (void *)nlh;

        iov.iov_len = nlh->nlmsg_len;


        msg.msg_name = (void *)&dst_addr;

        msg.msg_namelen = sizeof(dst_addr);

        msg.msg_iov = &iov;

        msg.msg_iovlen = 1;


        sendmsg(sock_fd, &msg, 0);


        /* Read message from kernel */

        memset(nlh, 0, NLMSG_SPACE(MAX_PAYLOAD));

        recvmsg(sock_fd, &msg, 0);

        printf("\nReceived message payload: %s\n", NLMSG_DATA(nlh));


        close(sock_fd);


        return 1;

}

             

void knetlink_input( struct sk_buff * skb)

{

        mutex_lock(&mut);

        printk("\nFunction %s() called", __FUNCTION__);


        netlink_rcv_skb(skb, &knetlink_process);


        mutex_unlock(&mut);

}


int knetlink_init( void )

{

        if ( knetlink_sk != NULL ) {

                printk("knetlink_init: sock already present\n");

                return 1;

        }


        knetlink_sk = netlink_kernel_create(&init_net, KNETLINK_UNIT, 0, knetlink_input, NULL, THIS_MODULE);

        if ( knetlink_sk == NULL ) {

                printk("knetlink_init: sock fail\n");

                return 1;

        }


        printk("knetlink_init: sock %p\n", (void*)knetlink_sk );

        return 0;

}


void knetlink_exit( void )

{

        if ( knetlink_sk != NULL ) {

                printk("knetlink_exit: release sock %p\n", (void*)knetlink_sk);

                sock_release( knetlink_sk->sk_socket );

        } else {

                printk("knetlink_exit: warning sock is NULL\n");

        }


}


module_init( knetlink_init );

module_exit( knetlink_exit );


MODULE_LICENSE("GPL");

MODULE_AUTHOR("Prashant Bhole");

MODULE_DESCRIPTION("Netlink Demo");

Reading additinal bytes from the netlink subsytem

On November 19th, 2008 Ravi kumar (not verified) says:

Hi,

Please let me know the following problem is a real issue or not?

I have written a program using generic netlinks to communicate to/from kernel space. The user program sends a string and expects two strings from the kernel.

The program is working well and as expected kernel sends two hello strings to the user.

But the problem is kernel is sending one more message on the same socket which I don't expect.
I.e after the first two reads on the socket, the third should block for the data until kernel sends further messages. But instead of blocking, the third read in the user application reads a message which seems to be an error message from kernel.

Please check below for the message prints from kernel.

On first socket read using recv(......)

38 00 00 00 30 00 00 00 1F 13 24 49 00 00 00 00 8 . . . 0 . . . . . $ I . . . . 01 01 00 00 22 00 01 00 68 65 6C 6C 6F 20 77 6F . . . . " . . . h e l l o w o
72 6C 64 20 66 72 6F 6D 20 6B 65 72 6E 65 6C 20 r l d f r o m k e r n e l 73 70 61 63 65 00 00 00 s p a c e . . .

second read

40 00 00 00 30 00 00 00 1F 13 24 49 00 00 00 00 @ . . . 0 . . . . . $ I . . . .
01 01 00 00 29 00 01 00 53 65 63 6F 6E 64 20 68 . . . . ) . . . S e c o n d h
65 6C 6C 6F 20 77 6F 72 6C 64 20 66 72 6F 6D 20 e l l o w o r l d f r o m
6B 65 72 6E 65 6C 20 73 70 61 63 65 00 00 00 00 k e r n e l s p a c e . . . .

Third read which should block is receiving following
message from kernel which I think is a bug.

24 00 00 00 02 00 00 00 1E 13 24 49 68 39 00 00 $ . . . . . . . . . $ I h 9 . .
00 00 00 00 30 00 00 00 30 00 05 00 1E 13 24 49 . . . . 0 . . . 0 . . . . . $ I
68 39 00 00 h 9 . .

The third message is not sent by my kernel generic driver but is always received in the user application.

Please help resolve the above issue.

Thanks in advance.....

Usage of netlinks for kernel to user space communication.

On November 10th, 2008 ravikumar (not verified) says:

Hi,

I am new to this networking field and usage of netlinks.
My requirement is to pass some data asynchronously to the kernel module from user space and viceversa.

I managed to satisfy my first requirement using netlinks. I.e I have created my own generic family in the kernel with some registered operations. Using netlinks library I manged to pass the data with appropriate command to my corresponding kernel module.

But I doubt whether data from the kernel module can be passed to user space asynchronously using netlinks.

Is there anyway that I can register some callback functions in the user on the same netlink family i have created in the kernel for specific commands and pass the data to the user space?

If yes please let me know how can i achieve it with generic netlink infrastructure.
If not it will greatfull if I can get some hints on the alternatives.

Thanks in advance,
Ravi kumar

Through Netlink sockets,Kernel echoes! -->Only echoes possible?

On October 6th, 2008 Ajith Pullanikkat (not verified) says:

HI all,

Through Netlink sockets,Kernel echoes! -->Only echoes possible?

Rather than expecting an echo message from kernel,can user by some means ask Kernel to send an expected reply.?
I will make things more clear.
Scenario:

I am expecting messages from kernel on any USB plug in.I am able to receive then also through netlink sockets.But if the USB is already plugged in,Kernel fails to send a message to user space.Can I demand a USB plug in /plug out message from Kernel by sending my requirement through sendmsg()!! -- :)

Responses appreciated.Thanks in advance.
Ajith

This code doesn't work

On January 19th, 2007 Stepchenko (not verified) says:

This code doesn't work correctly

#define MAX_PAYLOAD 1024 struct sock *nl_sk = NULL; void netlink_test() { sturct sk_buff *skb = NULL; struct nlmsghdr *nlh; int err; nl_sk = netlink_kernel_create(NETLINK_TEST, nl_data_ready); skb=alloc_skb(NLMSG_SPACE(MAX_PAYLOAD),GFP_KERNEL); nlh = (struct nlmsghdr *)skb->data; nlh->nlmsg_len = NLMSG_SPACE(MAX_PAYLOAD); nlh->nlmsg_pid = 0; /* from kernel */ nlh->nlmsg_flags = 0; strcpy(NLMSG_DATA(nlh), "Greeting from kernel!"); NETLINK_CB(skb).groups = 1; NETLINK_CB(skb).pid = 0; /* from kernel */ NETLINK_CB(skb).dst_pid = 0; /* multicast */ NETLINK_CB(skb).dst_groups = 1; /*multicast the message to all listening processes*/ netlink_broadcast(nl_sk, skb, 0, 1, GFP_KERNEL); sock_release(nl_sk->socket); }

here is missed one importang thing:

before strcpy we should call
skb_put(skb, NLMSG_SPACE(MAX_PAYLOAD))

or change

nlh = (struct nlmsghdr *)skb->data; nlh->nlmsg_len = NLMSG_SPACE(MAX_PAYLOAD); nlh->nlmsg_pid = 0; /* from kernel */ nlh->nlmsg_flags = 0;

to
NMLSG_PUT(...)

Best regards

nlh = (struct nlmsghdr *)

On May 4th, 2007 kamo (not verified) says:

nlh = (struct nlmsghdr *) skb_put(skb, NLMSG_SPACE(MAX_PAYLOAD));

best regards

Kernel Module

On December 6th, 2006 Amit Sahrawat (not verified) says:

#include linux/config.h
#include linux/socket.h
#include linux/kernel.h
#include linux/module.h
#include linux/netlink.h
#include net/sock.h

#define NETLINK_TEST 17

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Test");
MODULE_DESCRIPTION("Testing Kernel/User socket");

static int debug = 0;

module_param(debug, int, 0);
MODULE_PARM_DESC(debug, "Debug information (default 0)");

static struct sock *nl_sk = NULL;

static void nl_data_ready (struct sock *sk, int len)
{
wake_up_interruptible(sk->sk_sleep);
}

static void netlink_test()
{
struct sk_buff *skb = NULL;
struct nlmsghdr *nlh = NULL;
int err;
u32 pid;

nl_sk = netlink_kernel_create(NETLINK_TEST, nl_data_ready);
skb = skb_recv_datagram(nl_sk, 0, 0, &err);

nlh = (struct nlmsghdr *)skb->data;
printk(KERN_INFO "%s: received netlink message payload: %s\n", __FUNCTION__, NLMSG_DATA(nlh));

pid = nlh->nlmsg_pid;
NETLINK_CB(skb).groups = 0;
NETLINK_CB(skb).pid = 0;
NETLINK_CB(skb).dst_pid = pid;
NETLINK_CB(skb).dst_groups = 0;
netlink_unicast(nl_sk, skb, pid, MSG_DONTWAIT);
sock_release(nl_sk->sk_socket);
}

static int __init my_module_init(void)
{
printk(KERN_INFO "Initializing Netlink Socket");
netlink_test();
return 0;
}

static void __exit my_module_exit(void)
{
printk(KERN_INFO "Goodbye");
}

module_init(my_module_init);
module_exit(my_module_exit);

Makefile contents:
obj-m := netkernel.o
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

Install and test using:
insmod netkernel.ko

for messages check do:
tail /var/log/messages

Great article. Helped me

On September 11th, 2006 Anonymous (not verified) says:

Great article. Helped me alot to get my code ported to 2.6. Thanks.

Should the Linux kernal be recompiled?

On July 19th, 2006 Eswari (not verified) says:

Hi,
Should I recompile the linux kernal, after writing the Kernal module of netlink. If so, could you tell me how to do it?. I could not understand how the Kernal module of netlink will get activated. I want to send certain packets (coming from a certain IP addresses) to my application residing in User space. To filter the messages I want to use IP tables. How the IPtable filtered messages will go to the Kernal module of netlink, so that from there it will be sent to my user space application.

Could some one help me
Thanks
Eswari

Netlink is not the silver bullet

On June 19th, 2006 A concerned app programmer (not verified) says:

Shell/PERL/etc apps can use /proc on any distro without having to rebuild the app or worry about library incompatibilities. Conscientious developers are cautious when changing the /proc contents since hundreds of apps could be using the information... The netlink infrastructure may be more efficient, but how would the wealth of information provided by /proc be made available to system administrators as easily as /proc is via cat, less, or grep? How can I get information from netlink using those applications? The /proc support does have its advantages. Netlink is not the silver bullet.

Howerver, this is a great netlink article!

need for compiled example

On August 27th, 2005 majid taghiloo (not verified) says:

thanks for your article it is very useful . i try to create communication socket beetwin Kernel module and user land program . i used you proposed code . but it is not worked correctly . i compile my module and userland code correctly but there is no communication between them .
please , It would be nice if you could add working or at least compilable examples.

best regard's
M.taghiloo

need for compiled example

On August 27th, 2005 majid taghiloo (not verified) says:

best regard's
M.taghiloo

need for compiled example

On August 27th, 2005 majid taghiloo (not verified) says:

best regard's
M.taghiloo

Working userspace prog (below was kernel module, not userspace)

On July 7th, 2005 Anonymous (not verified) says:

/* Working version of the Netlink Socket code from Linux Journal's Kernel Korner */
#include
#include
#include
#include
#include
#include

#define MAX_PAYLOAD 1024
struct sockaddr_nl src_addr, dst_addr;
struct nlmsghdr *nlh = NULL;
struct msghdr msg;
struct iovec iov;
int sock_fd;

int main()
{
sock_fd = socket(PF_NETLINK, SOCK_RAW, NETLINK_NITRO);

memset(&src_addr, 0, sizeof(src_addr));
src_addr.nl_family = AF_NETLINK;
src_addr.nl_pid = getpid();
src_addr.nl_groups = 0; // no multicast
bind(sock_fd, (struct sockaddr*)&src_addr, sizeof(src_addr));

memset(&dst_addr, 0, sizeof(dst_addr));
dst_addr.nl_family = AF_NETLINK;
dst_addr.nl_pid = 0; // 0 means kernel
dst_addr.nl_groups = 0; // no multicast

nlh = (struct nlhmsghdr *)malloc(NLMSG_SPACE(MAX_PAYLOAD));

/* Fill the netlink message header */
nlh->nlmsg_len = NLMSG_SPACE(MAX_PAYLOAD);
nlh->nlmsg_pid = getpid();
nlh->nlmsg_flags = 0;

strcpy(NLMSG_DATA(nlh), "Yoo-hoo, Mr. Kernel!");

iov.iov_base = (void *)nlh;
iov.iov_len = nlh->nlmsg_len;

msg.msg_name = (void *)&dst_addr;
msg.msg_namelen = sizeof(dst_addr);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;

sendmsg(sock_fd, &msg, 0);

/* Read message from kernel */
memset(nlh, 0, NLMSG_SPACE(MAX_PAYLOAD));
recvmsg(sock_fd, &msg, 0);
printf("Received message payload: %s\n", NLMSG_DATA(nlh));

close(sock_fd);

return (EXIT_SUCCESS);

}

ENOBUFS error (solved)

On December 9th, 2006 Sébastien Barré (not verified) says:

Hi,

First, I would like to thank the author a lot for this article, it was very useful indeed.
I have tried the user space code above, and noticed a ENOBUFS (No buffer space available). I finally discovered the reason : the 'struct msghdr msg' was not zeroed, and only some fields are filled (msg_name, msg_namelen, msg_iov, msg_iovlen), letting for example the msg_controllen field undefined (a check of it is made in the kernel, if too large, a ENOBUFS is returned).
My problem was solved by adding the following line :
memset(&msg,0,sizeof(msg));
(of course, before filling the various necessary fields of the message).

I hope this will help some people in getting their code working.

The userspace program that compiles

On July 7th, 2005 Daniel Purcell (not verified) says:

/* The Linux Journal Kernel Korner -- Working, compiling version of the kernel code */
#include
#include
#include
#include
#include
#include
#include

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Daniel Purcell");
MODULE_DESCRIPTION("Kernel Korner's working versinon of netlink sockets");

// Note: Debug is not implemented
static int debug = 0;

module_param(debug, int, 0);
MODULE_PARM_DESC(debug, "Debug information (default 0)");

static struct sock *nl_sk = NULL;

static void nl_data_ready (struct sock *sk, int len)
{
wake_up_interruptible(sk->sk_sleep);
}

static void netlink_test()
{
struct sk_buff *skb = NULL;
struct nlmsghdr *nlh = NULL;
int err;
u32 pid;

nl_sk = netlink_kernel_create(NETLINK_NITRO, nl_data_ready);
skb = skb_recv_datagram(nl_sk, 0, 0, &err);

nlh = (struct nlmsghdr *)skb->data;
printk(KERN_INFO "%s: received netlink message payload: %s\n", __FUNCTION__, NLMSG_DATA(nlh));

static int __init my_module_init(void)
{
printk(KERN_INFO "Initializing Netlink Socket");
netlink_test();
return 0;
}

static void __exit my_module_exit(void)
{
printk(KERN_INFO "Goodbye");
}

module_init(my_module_init);
module_exit(my_module_exit);

Problem communicating using NETLINK SOCKETS

On November 27th, 2006 Nagendra KS (not verified) says:

Hi,

I am using NETLINK sockets to communicated from userspace to kernel space.
I have a code in the kernel which is responsible for forwarding input IP packets to the IP stack. The module that i have written in kernle will block communication between the network driver and the IP stack. In this case the driver gives the incoming packet directly to our userspace program that is waiting for such packets.

Once these packets arrive at the userspace using netlink sockets I give to back to the kernel, where in I have a netlink socket in kernel waiting for these packets.

I have a kernel thread running which waiting for the packets from the user space.
The piece of code that waits is given below:

skb = skb_recv_datagram(nl_sk_ip,0 , 0, &err).

This thread sleeps till it gets any data from the user space. Once it gets any packet from the userspace, its only job is to inject that packet to the IP Stack for processing.

Now I ping from my machine to some other machine in the network. The ping packet goes out in the normal way. But when u get a response back, the network driver instead of giving it to the IP stack it gives to the userspace program which is listening on a raw socket. This user sapce program forms a netlink message and sends it to the kernel space netlink code. This code calls the entry function for the IP stack with the received packet. The IP stack the analysis of the packet and sends the response back in the normal way out.

The problem is, the whole setup works fine for arround 40 ICMP packets after that the "sendmsg" at the userspace return with EAGAIN (Resource temporarily unavailable) error.

Any idea why I am getting this error?
Your help in solving this would be appreciated.

Thanks,
Nagendra.

netlink socket using

On April 12th, 2005 Michael (not verified) says:

Hello,
The article is very clear and understood. It describes the advantages of using netlink sockets. I suppose it might be very useful in inter processes / threads communication in user-space application. But regarding the kernel space, there are disadvantages such as:
1. Kernel recompiling, because it requires netlink.h update.
2. Because it's running in the context of sendmsg prosses, the trivial ioctl is preferred just in the reason that it's not so sophisticated.
Any comments are very welcome,
Regards,
Michael

kernel to kernel communication

On February 25th, 2005 linuxram says:

I hear that netlink provides support for communication within two different subsystems of the kernel. Wish this article had covered that.

examples

On January 24th, 2005 mike_k says:

It would be nice if you could add working or at least compilable examples.

thanks,
-M

Code sample in kernel itself

On August 29th, 2005 Samiullah Mohammed (not verified) says:

netlink is implemented as a device like /dev/netlink on 2.4.20-8
open,read,write functions from userland to /dev/netlink actually map to socket calls.

The kernel-sidecode for netlink is under /usr/src/linux-2.4/net/netlink/netlink_dev.c
If you wish to customize, you can change the NETLINK_MAJOR to a number you like (check major.h) and compile the module separrately with a makefile like

export KERN_NAME = linux-2.4.20-8
CFLAGS = -I /usr/src/$(KERN_NAME)/include -D__KERNEL__

netlink_dev1.o: netlink_dev1.c

user space code for 2.4 kernel

On October 18th, 2006 Anonymous (not verified) says:

Hi,

I am a netlink newbie. I saw your comment abt the kernel space code on 2.4 kernel. I was able to compile and load the kernel module as per your suggestion. Could you tell me how I can test it, as I need a user space code.

I tried the user space code provided in the article. It gives me the following error
mipsel-linux-gcc netlink.c
In file included from netlink.c:3:
/opt/toolchains/uclibc-crosstools_linux-2.4.25_gcc-3.3.5_uclibc-20050308-20050502/mipsel-linux-uclibc/sys-include/linux/netlink.h:22: error: parse error before "__u32"
/opt/toolchains/uclibc-crosstools_linux-2.4.25_gcc-3.3.5_uclibc-20050308-20050502/mipsel-linux-uclibc/sys-include/linux/netlink.h:28: error: parse error before "__u32"
/opt/toolchains/uclibc-crosstools_linux-2.4.25_gcc-3.3.5_uclibc-20050308-20050502/mipsel-linux-uclibc/sys-include/linux/netlink.h:30: error: parse error before "nlmsg_flags"
/opt/toolchains/uclibc-crosstools_linux-2.4.25_gcc-3.3.5_uclibc-20050308-20050502/mipsel-linux-uclibc/sys-include/linux/netlink.h:31: error: parse error before "nlmsg_seq"
/opt/toolchains/uclibc-crosstools_linux-2.4.25_gcc-3.3.5_uclibc-20050308-20050502/mipsel-linux-uclibc/sys-include/linux/netlink.h:32: error: parse error before "nlmsg_pid"
/opt/toolchains/uclibc-crosstools_linux-2.4.25_gcc-3.3.5_uclibc-20050308-20050502/mipsel-linux-uclibc/sys-include/linux/netlink.h:82: error: field `msg' has incomplete type
netlink.c: In function `main':
netlink.c:16: error: invalid application of `sizeof' to an incomplete type
netlink.c:17: error: invalid use of undefined type `struct sockaddr_nl'
netlink.c:18: error: invalid use of undefined type `struct sockaddr_nl'
netlink.c:19: error: invalid use of undefined type `struct sockaddr_nl'
netlink.c:20: error: invalid application of `sizeof' to an incomplete type
netlink.c:22: error: invalid application of `sizeof' to an incomplete type
netlink.c:23: error: invalid use of undefined type `struct sockaddr_nl'
netlink.c:24: error: invalid use of undefined type `struct sockaddr_nl'
netlink.c:25: error: invalid use of undefined type `struct sockaddr_nl'
netlink.c:27: error: invalid application of `sizeof' to an incomplete type
netlink.c:27: warning: assignment from incompatible pointer type
netlink.c:30: error: dereferencing pointer to incomplete type
netlink.c:30: error: invalid application of `sizeof' to an incomplete type
netlink.c:31: error: dereferencing pointer to incomplete type
netlink.c:32: error: dereferencing pointer to incomplete type
netlink.c:34: error: invalid application of `sizeof' to an incomplete type
netlink.c:37: error: dereferencing pointer to incomplete type
netlink.c:40: error: invalid application of `sizeof' to an incomplete type
netlink.c:49: error: invalid application of `sizeof' to an incomplete type
netlink.c:51: error: invalid application of `sizeof' to an incomplete type
netlink.c: At top level:
netlink.c:6: error: storage size of `src_addr' isn't known
netlink.c:6: error: storage size of `dst_addr' isn't known

my userspace appl code is as follows
======================================
#include
#include

#define MAX_PAYLOAD 1024 /* maximum payload size*/
struct sockaddr_nl src_addr, dst_addr;
struct nlmsghdr *nlh = NULL;
struct msghdr msg;
struct iovec iov;
int sock_fd;

int main()
{
sock_fd = socket(PF_NETLINK, SOCK_RAW, NETLINK_FIREWALL);

memset(&dst_addr, 0, sizeof(dst_addr));
dst_addr.nl_family = AF_NETLINK;
dst_addr.nl_pid = 0; // 0 means kernel
dst_addr.nl_groups = 0; // no multicast

nlh = (struct nlhmsghdr *)malloc(NLMSG_SPACE(MAX_PAYLOAD));

/* Fill the netlink message header */
nlh->nlmsg_len = NLMSG_SPACE(MAX_PAYLOAD);
nlh->nlmsg_pid = getpid();
nlh->nlmsg_flags = 0;

strcpy(NLMSG_DATA(nlh), "Yoo-hoo, Mr. Kernel!");

iov.iov_base = (void *)nlh;
iov.iov_len = nlh->nlmsg_len;

msg.msg_name = (void *)&dst_addr;
msg.msg_namelen = sizeof(dst_addr);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
printf("Waiting for message from kernel\n");

sendmsg(sock_fd, &msg, 0);

/* Read message from kernel */
memset(nlh, 0, NLMSG_SPACE(MAX_PAYLOAD));
recvmsg(sock_fd, &msg, 0);
printf("Received message payload: %s\n", NLMSG_DATA(nlh));

close(sock_fd);

return (0);

}

Thanks in advance

Ashwin.

Help Please

On July 20th, 2006 Eswari (not verified) says:

Hi,
I used the following command to compile the netlink_dev

cc -o netlink_dev.o netlink_dev.c -I /usr/src/linux-2.4.7-10/include -D__KERNEL__

I am getting the following error,

/usr/lib/gcc-lib/i386-redhat-linux/2.96/../../../crt1.o: In function `_start':
/usr/lib/gcc-lib/i386-redhat-linux/2.96/../../../crt1.o(.text+0x18): undefined reference to `main'
/tmp/ccYfFxIZ.o: In function `netlink_write':
/tmp/ccYfFxIZ.o(.text+0xc5): undefined reference to `sock_sendmsg'
/tmp/ccYfFxIZ.o: In function `netlink_read':
/tmp/ccYfFxIZ.o(.text+0x157): undefined reference to `sock_recvmsg'
/tmp/ccYfFxIZ.o: In function `netlink_open':
/tmp/ccYfFxIZ.o(.text+0x1e4): undefined reference to `sock_create'
/tmp/ccYfFxIZ.o(.text+0x244): undefined reference to `sock_release'
/tmp/ccYfFxIZ.o: In function `netlink_release':
/tmp/ccYfFxIZ.o(.text+0x2e7): undefined reference to `sock_release'
/tmp/ccYfFxIZ.o: In function `devfs_register_chrdev':
/tmp/ccYfFxIZ.o(.text+0x496): undefined reference to `register_chrdev'
/tmp/ccYfFxIZ.o: In function `init_netlink':
/tmp/ccYfFxIZ.o(.text.init+0x5a): undefined reference to `printk'
collect2: ld returned 1 exit status

Could you please help me, thanks

Eswari

compile error

On September 29th, 2005 liuhua (not verified) says:

I type all the source code as above article in FC4(2.6.11-1.1369_FC4-i686 kernel).

kernel code error:
for "sk->sk_sleep" and "sock_release(nl_sk->sk_socket)":
dereferencing pointer to incomplete type

user code error:
on line"nlh->nlmsg_len=NLMSG_SPACE(MAX_PAYLOAD)"
syntax error before '=' token

what's the reason?? Help me please

Re: compile error

On May 25th, 2006 Chinmaya (not verified) says:

Inclued the following line at top of the program.

#include

Thanks
Chinmaya

Re: compile error

On May 25th, 2006 Chinmaya (not verified) says:

Inclued the following line at top of the program. #include net/sock.h.

Thanks
Chinmaya

User Space module

On December 6th, 2006 Amit Sahrawat (not verified) says:

#include sys/stat.h

#include unistd.h

#include stdio.h

#include stdlib.h

#include sys/socket.h

#include sys/types.h

#include string.h

#include asm/types.h

#include linux/netlink.h

#include linux/socket.h

#define NETLINK_TEST 17

#define MAX_PAYLOAD 1024

struct sockaddr_nl src_addr, dst_addr;

struct nlmsghdr *nlh = NULL;

struct msghdr msg;

struct iovec iov;

int sock_fd;

int main(int argc,char **argv)

{

sock_fd = socket(PF_NETLINK, SOCK_RAW, NETLINK_TEST);

memset(&dst_addr, 0, sizeof(dst_addr));

dst_addr.nl_family = AF_NETLINK;

printf("%s :",argv[1]);

if(argc>0)

dst_addr.nl_pid = atoi(argv[1]); // 0 means kernel

else

dst_addr.nl_pid = 0;

dst_addr.nl_groups = 0; // no multicast

printf("SOCK FD :%d \n",sock_fd);

nlh = (struct nlhmsghdr *)malloc(NLMSG_SPACE(MAX_PAYLOAD));

/* Fill the netlink message header */

nlh->nlmsg_len = NLMSG_SPACE(MAX_PAYLOAD);

nlh->nlmsg_pid = getpid();

nlh->nlmsg_flags = 0;

strcpy(NLMSG_DATA(nlh), "User Spaces: Message from User to Kernel!");

iov.iov_base = (void *)nlh;

iov.iov_len = nlh->nlmsg_len;

msg.msg_name = (void *)&dst_addr;

msg.msg_namelen = sizeof(dst_addr);

msg.msg_iov = &iov;

msg.msg_iovlen = 1;

sendmsg(sock_fd, &msg, 0);

memset(nlh, 0, NLMSG_SPACE(MAX_PAYLOAD));

recvmsg(sock_fd, &msg, 0);

printf("Received message payload: %s\n", NLMSG_DATA(nlh));

close(sock_fd);

return (1);

}

Save as netwriter.c,
Compile using
gcc netwriter.c -o netwriter
For execution, give Process ID as arguement. '0' for kernel.

Post new comment

저작자표시 (새창열림)

'Linux' 카테고리의 다른 글

프로세스의 메모리 사용량 (0)	2017.01.11
rpm 사용법 (0)	2014.02.13
Linux 커널 드라이버 모형: 협업의 장점 - in Beautiful code (0)	2010.10.22
Linux kernel : NMI 감시기 (0)	2009.12.31
[Windowing System] Linux X Server (0)	2009.08.27

PREV 이전 1 2 3 NEXT 다음

Linux

'Linux' 카테고리의 다른 글

'Linux' 카테고리의 다른 글

'Linux' 카테고리의 다른 글

'Linux' 카테고리의 다른 글

'Linux' 카테고리의 다른 글

'Linux' 카테고리의 다른 글

NMI 감시기 검사

멀티 프로세서 시스템에서 커널 개발자를 위한 Watchdog Sytem

슬랩 할당자

'Linux' 카테고리의 다른 글

Kernel Korner - Why and How to Use Netlink Socket

January 5th, 2005 by Kevin Kaichuan He in SysAdmin

Comment viewing options

On November 25th, 2009 P (not verified) says:

On November 12th, 2009 Anonymous (not verified) says:

On April 13th, 2009 prashant bhole (not verified) says:

On November 19th, 2008 Ravi kumar (not verified) says:

On November 10th, 2008 ravikumar (not verified) says:

On October 6th, 2008 Ajith Pullanikkat (not verified) says:

On January 19th, 2007 Stepchenko (not verified) says:

On May 4th, 2007 kamo (not verified) says:

On December 6th, 2006 Amit Sahrawat (not verified) says:

On September 11th, 2006 Anonymous (not verified) says:

On July 19th, 2006 Eswari (not verified) says:

On June 19th, 2006 A concerned app programmer (not verified) says:

On August 27th, 2005 majid taghiloo (not verified) says:

On August 27th, 2005 majid taghiloo (not verified) says:

On August 27th, 2005 majid taghiloo (not verified) says:

On July 7th, 2005 Anonymous (not verified) says:

On December 9th, 2006 Sébastien Barré (not verified) says:

On July 7th, 2005 Daniel Purcell (not verified) says:

On November 27th, 2006 Nagendra KS (not verified) says:

On April 12th, 2005 Michael (not verified) says:

On February 25th, 2005 linuxram says:

On January 24th, 2005 mike_k says:

On August 29th, 2005 Samiullah Mohammed (not verified) says:

On October 18th, 2006 Anonymous (not verified) says:

On July 20th, 2006 Eswari (not verified) says:

On September 29th, 2005 liuhua (not verified) says:

On May 25th, 2006 Chinmaya (not verified) says:

On May 25th, 2006 Chinmaya (not verified) says:

On December 6th, 2006 Amit Sahrawat (not verified) says:

Post new comment

'Linux' 카테고리의 다른 글

티스토리툴바

January 5th, 2005 by Kevin Kaichuan He in

SysAdmin