I’m not really into the niche of “virtual YouTubers” – people who post YouTube videos and/or stream using a virtual avatar – but to each their own, and if this technology enables people to remain anonymous while doing what they love on YouTube or Twitch, I’m all for it. Since these virtual avatars also do things like face-tracking, there’s a whole cottage industry of software tools to make this all work, but Adrian “asie” Siekierka decided to take a look at where the training data used to make such face-tracking work actually comes from.
One day, some years ago, I decided to look at the data used to train OpenSeeFace. OpenSeeFace is the most popular open source face tracking solution for virtual YouTubers. It is supported by both open source and commercial model rendering tools; in particular, VTube Studio allows using it as an option for webcam tracking.
↫ Adrian “asie” Siekierka
The results of the investigation are not exactly great. Much of the data used by OpenSeeFace comes with serious restrictions on commercial use, and many of the underlying datasets contain images that you would need consent for from the people inside the image to actually use. On top of that, a lot of these data sets seem to have just scraped the internet for images of faces without asking anyone of the people in those images for consent, which raises a whole number of troubling issues.
I find this a very interesting topic of discussion, if only because you’d be hard-pressed to argue that the average cartoon-esque virtual avatars even remotely resemble real human faces, so it’s not like you’re going to suddenly run into your own face somewhere on YouTube or Twitch, but plastered into another person. On the other hand, the underlying datasets still contain a ton of people’s faces without those people’s consent, and even for those that did give consent, there’s often a commercial use restriction which earning revenue on YouTube or Twitch might violate.
It’s a fascinating microcosm of a whole slew of issues we’re dealing with right now, neatly contained in a relatively small niche.

I was totally unaware of this space, but now vaguely knowing about it from reading this, what I find interesting about it is a potential, eventual, future in expanded VR (Coming Soon(tm)!) usage where we actually have face tracking/response while keeping the VR avatars not necessarily looking like the actual people. Think a bit like Snow Crash avatars, and the potential for some real non-verbal communication/cues that could be carried through VR, even with fantasy/etc type avatars.
I think it’s something that’s a potential nice, interesting building block that I’m glad some people are pursuing, even if I personally don’t think I’ll be using them in the current incarnations.
What you describe is the present (certainly niche, but with actual users). There are people that hang in “VR Chat” with full body tracking (FBT), eye tracking and face tracking. That enables them to have the avatar represent rather faithfully dance moves, have the eyes indicate where that person is looking and have somewhat representative mouth movements.
It is using consumer hardware (high spec gaming PC, Vive Pro Eye or BigScreen Beyond 2e, enough vive/tundra trackers for 11-point FBT and a vive face tracker). On the software side just Steam and VR Chat. Certainly a pretty expensive hobby (not mine).
There are VTubers that use that for Vtubing — or whatever it is called ;).
Antartica,
The progress has always been like that.
Decades ago, the first VR systems I used were in million dollar labs. One of them required permission from some government department to get access during internship.
Today we can use $300 goggles from Meta and have a similar experience.
This is another issue where ethics could be counter-intuitive.
The “early adapters”, “market movers” have captured enormous amount of data. Before social media platforms like Reddit or Twitter realized how valuable their user generated content was, they were all scraped and put into training data sets. The entire web, wikipedia, many specialized websites, forums and so on were also included.
Today, Reddit charges for this data (I think they might have reached a state where most of the revenue could be coming from actually selling user data, not ads), and Twitter is locked down. Wikipedia is poisoned, and the web is using mechanisms for “anti ai scraping”.
This causes a two tier, unequal, world. Where the early movers have not only the usual advantage, but they use the new de facto and de jure regulations to block newcomers from even having the data to compete.
And if they are asked to “clean up”, it would be relatively easy for them to build a “clean room” version of data derived from the original, and purge those images. In other words, they have a data moat, not a hardware moat, not a software moat, but a data moat.
Do I have a solution?
No
The genie is already out of the bottle.
sukru,
It’s good food for thought.
I think it might be pretty cool to have more community led AI projects to compete against proprietary companies, but there’s no doubt that we’ve closed a lot of the gates for new players. I don’t really know what we can/should do about this, but ironically I also see the actions we’re taking to protest AI companies are actually solidifying their advantages.
Yes.
One of the first things OpenAI CEO did after open source models like Llama became viable… was running to government heads for “sensible AI regulation”. In other words those big ones really want regulatory capture.