New C/C++ Source Code Search Website

Submitted by Sembiance 2005-08-10 General Development 29 Comments

“After over a year of work, my C/C++ Source Code Search website is finally live! It allows you to search over 107 million lines of open source C/C++ code and it actually understands the C/C++ syntax thus giving better results. I’m currently adding 2 to 3 million lines of new code per day.”

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

29 Comments

2005-08-10 7:22 pm

Anonymous
An interesting and very welcome search tool! I’m surprised that I’ve not seen anything like this before.

The only miggle I have with it is that it’s very slow to perform searches. 🙁
2005-08-10 7:25 pm

Anonymous
Does anyone have any luck accessing the site, or is it osnewsdotted?
2005-08-10 7:28 pm

Anonymous
Looks very promising and I really like the idea. Yes, currently a bit slow right now, but it says to expect that on the website (poor little celeron ). Will try it again once the load goes down a bit…
2005-08-10 8:05 pm

case
Really great piece of work. Talk about persistence!! I tried it out and it works great. The sites slow but that can be solved.

Dose anybody have a few mainframes we could send the guy?
2005-08-10 8:19 pm

Anonymous
Yes, sorry about the speed.

It was doing okay, but the number of hits coming from here, osnews, is too much for it

I can’t really afford more servers a the moment, so check back in a bit and hopefully it’ll be responsive then
2005-08-10 8:19 pm

Sembiance
Forgot to log in above
2005-08-10 8:43 pm

Anonymous
How does it compare with koders? The site is at the moment too slow to be usable so I can’t test it myself.

2005-08-10 8:51 pm

Sembiance
Koders treats all their code as plain text files.

Sometimes when you search it can be difficult to find exactly what your after since koders doesn’t know the difference between a comment, class definiton or called function.

My site understands C/C++ syntax and thus allows you to search only the type of code construct you are after.

Also since my site understands syntax, it is able to hyperlink the code whenever a function is called or a file included so you can click on it and be taken directly to the source for that function or include file.

Oh, and my site is a LOT slower right now because of the traffic, and the small box that it is on

2005-08-10 8:59 pm

ma_d
It looks cool. You should get a good charity to host it for you . Or make a local version for it?

2005-08-10 8:47 pm

Sembiance
Okay, I cleaned up some processes on the server, and tried some new settings for apache.

The site seems to be responding much better now, and hopefully I can continue to weather the traffic.

2005-08-10 10:09 pm

eKstreme
How about hosting it on sourceforge? I bet they would take it up in a snap!

Good job btw. I was surprised to see some bioinformatics source in there – never thought anyone cared beyond geeky biologists

2005-08-10 9:13 pm

bact
thank you for this cool piece of work ! kudos!

any plan for Python, Java, … ?

2005-08-10 9:21 pm

Sembiance
Currently I don’t have any hard plans on adding other languages.

I still need to had several hundred million new lines of C/C++ code, and I still haven’t figure out what I’ll do about the server yet.

Technically speaking though, there is no reason why Python, Java, PHP, etc. couldn’t all be added as well.

Maybe in the future I’ll have the time/CPU power/Disk space to do that
2005-08-11 6:03 pm

Anonymous
Oh yes lets write it all in the newest available language those *must* be much better then the old and dusty C(++).

Sembiance, can you add the brainfuck language please?

2005-08-10 9:23 pm

Anonymous
It is cute, but all that is necessary for a general source search is a parser, indexer, and searcher.

For indexing and searching, one can toss basically everything into a database that has an index on the indexable item. Create new tables for each of the indexable options.

As for the parser, that’s easy too. Choose your favorite parser (I like DParser personally), and have it insert entries into the database.

Toss a frontend on that beast, and you are done.

2005-08-10 9:36 pm

Anonymous
It is always easy to say that after somebody has come with the idea… Maybe you can try that to implement the search for other programming languages.

2005-08-14 10:57 pm

Anonymous
I’ve got a noncompete with my employer; if I had time to build the thing in the first place.

My comment wasn’t based on hindsight (as you suggest), it is based on experience: For the last year I have been employed building search engines for a startup. It is not hard if you know what you are doing (we don’t use a DB, but using a DB gets one a faster application than writing a scalable indexer and query engine – checking his about page, he uses Lucene as his indexer and query engine, plugging the CodeWorker C/C++ parser on the front end).

As for coming up with the idea, cTags have been around for what, over a decade? Indexing and searching source code is not a new idea.

2005-08-10 11:41 pm

rover
Not all open source licenses are equal, nor are they all compatible with one each other. The ability to filter results by license is absolutely essential.

2005-08-11 3:32 am

Sembiance
You can search within certain licenses by going to the license details page:

http://csourcesearch.net/license/GPL-2/

And then choosing to only search within that license.

I’m working on adding a ‘browse’ section where you can browse licenses, then it’ll be easier to find a certain license.

2005-08-11 2:25 am

Anonymous
and thanks for the site. It is all I have to say.
2005-08-11 6:05 am

Anonymous
ahem… http://www.koders.com/ … does already the same 😉
2005-08-11 6:28 am

Anonymous
Cant you read ?

Differences where outlined earlier in comments.
2005-08-11 8:55 am

l3v1
Just wanted to say the same thing. Koders (www.koders.com) has been doing the same for over 20 [programming] languages. With the appropriate Firefox search plugins, it just rules.

As for this new C code search site, I wish you well, and as the FOSS attitude suggests, choice is good, but this time I think you need some miracle to become better.

Also the line “sorry Opera and Konqueror users” doesn’t make it more appealing.

All in all, wish you good luck, but Koders is my place.
2005-08-11 9:13 am

Sembiance
I’ve added a browse page so you can browse all packages, licenses and categories.
2005-08-11 10:43 am

John Nilsson
I’ve never used anything like this before. What a wonderfull application!

I did the same search on booth sited now and I must say that csourcesearch has a MUCH better way of presenting hits.
2005-08-11 1:41 pm

Anonymous
idea is nice but the implementation appearantly sucks bigtime. a search takes a few dozen seconds.
2005-08-11 4:31 pm

Anonymous
csourcesearch seems to produce much better results than koders – congrats to the author…..
2005-08-11 5:16 pm

Anonymous
Awesome! It just happened I had to implement huffman compression on my personnal project. Found the code there, implemented it in a couple minutes.

Granted, I would’ve done that with Google with probably the same success. But this new tool is elegant and clean to use.

Hopefully this gonna have a nice future.
2005-08-15 8:30 am

Anonymous
Though I’m obliged to say that because my code is floating around in it!

http://csourcesearch.net/package/anagramarama/0.2/anagramarama/src/…

Now a request – please add a match and detect for constants? e.g. in above package CLEARBOXSTARTY.

This is a cool app, like the xslt too – a project I’ve recently completed used perl to build a matrix.xml that indexed a load of cross referenced requirements specificactions then xslt to search and format. Much more restricted data set than millions of lines of code and much more fuzzy matching required, but it turned a useless directory structure of requirements specs (mostly use cases) into a useful resource.

What I did though was send the whole xml to the client direct and then did sorting and filtering direct on the client(intranet application, so a 2MB xml download is not painful). Perhaps you could use this bundling approach to reduce the number of hits, say in the XML, bring back a whole source module, rather than just the required function.

It’s cool to see that I’m not working in isolation though to my shame, mine only works on IE… (*blush*) Never had the time to work out Firefox XSLT, always encountered problems.

Thanks

and btw – performance is good for me, perhaps it’s only when the peaks come it’s a problem or perhaps you’ve found some bandwidth.

—

Colm