<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Align Chronicles</title>
    <description>A retro sci-fi terminal blog exploring the intersection of technology, artificial intelligence, and human alignment in a cyberpunk future.</description>
    <link>https://vinayprabhu.github.io/alignchronicles/</link>
    <atom:link href="https://vinayprabhu.github.io/alignchronicles/feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Sun, 01 Feb 2026 07:47:23 +0000</pubDate>
    <lastBuildDate>Sun, 01 Feb 2026 07:47:23 +0000</lastBuildDate>
    <generator>Jekyll v3.10.0</generator>
    
      <item>
        <title>Gas, Whey &amp; Lobsters and Indian-Homework</title>
        <description>&lt;h1 id=&quot;gas-whey--lobsters-and-indian-homework&quot;&gt;Gas, Whey &amp;amp; Lobsters and Indian-Homework&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Table of contents&lt;/strong&gt;&lt;/p&gt;

&lt;h1 id=&quot;whey-gas-and-lobsters&quot;&gt;Whey, gas and lobsters&lt;/h1&gt;

&lt;p&gt;Wrongfully castigating an invaluable resource as wasteful dross has been one of humanity’s most original, artful and cardinal sins.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Whey - &lt;a href=&quot;https://www.sciencedirect.com/science/article/abs/pii/S0958694608000344&quot;&gt;Gutter-to-Gold&lt;/a&gt;: Till the &lt;a href=&quot;https://www.sciencedirect.com/science/article/pii/S0924224421005124&quot;&gt;development of membrane filtration technologies in the 1960s&lt;/a&gt;, protein-rich &lt;strong&gt;&lt;em&gt;sweet whey&lt;/em&gt;&lt;/strong&gt; was literally disposed down the drains by the dairy industry. Today, it is widely considered to be the go-to protein choice for most fitness fanatics with a burgeoning &lt;a href=&quot;https://www.prnewswire.com/news-releases/whey-powder-market-size-is-expected-to-reach-us5-80-billion-by-2031-at-cagr-of-5-5-during-the-forecast-period--the-insight-partners-302550792.html&quot;&gt;multi-billion market&lt;/a&gt;. Also, it is painful to imagine the civilizational cost we paid for not building out distribution pipelines for this nutritious by-product to be shared amongst the masses &amp;amp; ensure they got sustained on high-protein diets!&lt;/p&gt;

    &lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/whey_image.png&quot; alt=&quot;Source: [https://www.sciencedirect.com/science/article/pii/S0924224421005124](https://www.sciencedirect.com/science/article/pii/S0924224421005124)&quot; /&gt;&lt;/p&gt;

    &lt;p&gt;Source: &lt;a href=&quot;https://www.sciencedirect.com/science/article/pii/S0924224421005124&quot;&gt;https://www.sciencedirect.com/science/article/pii/S0924224421005124&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;Natural gas - &lt;a href=&quot;https://www.si.edu/object/invisible-fuel-manufactured-and-natural-gas-america-1800-2000-christopher-j-castaneda:siris_sil_627688&quot;&gt;Invisible fuel&lt;/a&gt;: Natural gas was historically considered a worthless byproduct of oil drilling and was often burned at the wellhead. Vast amounts of engineering clock cycles were abused to invent techniques such as &lt;strong&gt;&lt;em&gt;Optimal-venting&lt;/em&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;em&gt;Production-flaring&lt;/em&gt;&lt;/strong&gt; with the sole goal to efficiently dispose this invaluable resource right up until the &lt;a href=&quot;https://en.wikipedia.org/wiki/Natural_gas&quot;&gt;early part of the 20th century&lt;/a&gt;. Today, it is not only hailed as the &lt;a href=&quot;https://www.igs.com/energy-resource-center/energy-101/what-makes-natural-gas-the-cleanest-fossil-fuel&quot;&gt;cleanest fossil fuel&lt;/a&gt; but also considered to be ‘an integral part of &lt;a href=&quot;https://obamawhitehouse.archives.gov/the-press-office/2014/01/28/president-barack-obamas-state-union-address&quot;&gt;green-energy strategies around the world as the bridge-fuel&lt;/a&gt; of choice!&lt;/li&gt;
  &lt;li&gt;Lobsters- Cockroach to Caviar: Lobsters were once called the “&lt;a href=&quot;https://www.foodbeast.com/news/oh-lobster-you-so-fancy/&quot;&gt;cockroaches of the sea&lt;/a&gt;,” a stigmatized scavenger so abundant in colonial New England that they would wash ashore in knee-deep piles after storms. Far from the luxury status they enjoy today, these crustaceans were viewed as the “protein of the bad man,” a cheap and “cruel” filler fed to prisoners and indentured servants. Today, it is a high end gourmet protein served in high-end restaurants and currently sustains a &lt;a href=&quot;https://www.stellarmr.com/report/lobster-market/2814&quot;&gt;7 Billion market growing at scorching pace&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h1 id=&quot;indias-silent-crime-throwing-away-the-homework-treasure-trove&quot;&gt;India’s silent crime: Throwing away the homework treasure-trove&lt;/h1&gt;

&lt;p&gt;Alongside the Whey, Gas and Lobster triad, I believe lies India’s homework treasure-trove. India criminally throws away its academic data goldmine that is the reams and reams of graded homework her students produce every day!&lt;br /&gt;
&lt;strong&gt;The numbers:&lt;/strong&gt; Assuming an average student takes 6 courses and generates roughly 100 KB worth of data via homework, assignments and tests, we have a mindboggling 998.4 GB of data being generated every academic calendar year spanning 32 weeks  by a single academic group like the &lt;a href=&quot;http://www.jpsbangalore.com/chairman.aspx&quot;&gt;Jain group&lt;/a&gt; (that boasts of a student body of 51600 students). This data-trove, as we type this, is currently getting discarded into abyss never to be utilized in any way and the scale of this wastage is mindboggling.  To put things into perspective, this data mined thus from a single academic group in a single year spans more than [&lt;em&gt;The Pile](https://arxiv.org/abs/2101.00027),&lt;/em&gt; that is one of the largest openly available dataset (that is &lt;em&gt;only&lt;/em&gt; 825 GB). Add in the fact that much of this has been graded by a human evaluator (examiner evaluating grades and providing feedback notes) &lt;strong&gt;&lt;em&gt;facilitates rich algorithmic advances that the field is yet to see!&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/whey_Available_textual_data_on_the_WWW.png&quot; alt=&quot;The data moat is a data iceberg!  &quot; /&gt;&lt;/p&gt;

&lt;p&gt;The data moat is a data iceberg!&lt;/p&gt;

&lt;p&gt;Now comes the obvious question: Whey had its “Protein moment”. Lobsters had its “Gourmet moment”. Natural Gas had its “Bridge-fuel” moment. What would be the X in Indian homework’s X moment? I posit it would be that graded homework will have its “RL (Reinforcement Learning) moment”. 
In the following sections I will motivate how India’s gargantuan base of graded homework can be a game-charger for progressing state-of-the-art AI towards it prophesized “AGI” moment and how it could be orchestrated ethically and profitably.&lt;/p&gt;

&lt;h2 id=&quot;a-peek-into-how-sota-aillms-are-trained&quot;&gt;A peek into how SOTA AI/LLMs are trained&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/whey_image%201.png&quot; alt=&quot;The pipeline was sourced from: [https://allenai.org/blog/olmo3](https://allenai.org/blog/olmo3)&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The pipeline was sourced from: &lt;a href=&quot;https://allenai.org/blog/olmo3&quot;&gt;https://allenai.org/blog/olmo3&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As seen in the figure above, &lt;a href=&quot;https://allenai.org/blog/olmo3&quot;&gt;the state-of-the-art AI/Large Language Model training pipeline&lt;/a&gt; is largely dichotomized into two phases: 
The pre-training phase (or the boring dead phase) that results in a “Base Model”. 
The post-training phase that results in “Thinking Models”.&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;https://huggingface.co/datasets/allenai/dolma3_mix-6T-1025&quot;&gt;state-of-the-art pre-training data&lt;/a&gt; is mostly derived from Common-Crawl (4.5T tokens our 6T tokens for the Dolma3) and augmented with academic publications, code etc.  This part of the pipeline is widely believed to be the &lt;em&gt;boring part or even the &lt;a href=&quot;https://www.theverge.com/2024/12/13/24320811/what-ilya-sutskever-sees-openai-model-data-training&quot;&gt;dead part&lt;/a&gt;&lt;/em&gt; with most the WWW being already scraped dry.
The post-training phase is where the real action is and the real fun begins. The base model that emerges freshly baked from the pre-training phase is just an unaligned token-predictor that is unworthy of an real economic utility and most certainly un-deployable as a a chatbot. But when it is subjected to a guided Reinforcement Learning from Human Feedback (RLHF) /. Supervised Fine-Tuning phase (See figure below taken from Andrej Karpathy’s canonical talk), it turns into &lt;em&gt;chat-bot styled Thinking model&lt;/em&gt; that is also putatively aligned with complex “human values”.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/whey_image%202.png&quot; alt=&quot;Source: [https://www.youtube.com/watch?v=bZQun8Y4L2A](https://www.youtube.com/watch?v=bZQun8Y4L2A) - State of GPT talk by Andrej Karpathy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Source: &lt;a href=&quot;https://www.youtube.com/watch?v=bZQun8Y4L2A&quot;&gt;https://www.youtube.com/watch?v=bZQun8Y4L2A&lt;/a&gt; - State of GPT talk by Andrej Karpathy&lt;/p&gt;

&lt;p&gt;This phase of human preference alignment is currently stricken with serious shortcomings, is widely known to be a specialized dark-art and this is also where the data-labelers’ gold-rush has been in. From the algorithmic efficiency perspective, experts such as Yann LeCun have repeatedly expressed deep suspicion on its inefficiency while once even branding it &lt;strong&gt;&lt;em&gt;“hopeless”&lt;/em&gt;&lt;/strong&gt; (See Tweet screenshot below).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/whey_image%203.png&quot; alt=&quot;[https://x.com/search?q=from%3Aylecun RLHF is hopeless&amp;amp;src=typed_query](https://x.com/search?q=from%3Aylecun%20RLHF%20is%20hopeless&amp;amp;src=typed_query)&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://x.com/search?q=from%3Aylecun%20RLHF%20is%20hopeless&amp;amp;src=typed_query&quot;&gt;https://x.com/search?q=from%3Aylecun RLHF is hopeless&amp;amp;src=typed_query&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The main nuance is that RL generally is a beast that appears to be quack-science in low-data regimes that suddenly awakens into a formidable beast in the large-data regime whilst being aggressively data inefficient. In fact, there’s an &lt;a href=&quot;https://value-scaling.github.io/&quot;&gt;entire cottage industry of literature&lt;/a&gt; on the so-termed &lt;a href=&quot;https://www.lesswrong.com/posts/xpj6KhDM9bJybdnEe/how-well-does-rl-scale&quot;&gt;breathtakingly inefficient scaling&lt;/a&gt; nature of RL-scaling.&lt;/p&gt;

&lt;p&gt;It is precisely this inefficiency that has birthed overnight unicorns such as Mercor, Surge and Scale AI ( Semianalysis has an absolutely &lt;a href=&quot;https://newsletter.semianalysis.com/p/scaling-reinforcement-learning-environments-reward-hacking-agents-scaling-data&quot;&gt;brilliantly blogpost&lt;/a&gt; covering this).
And it is precisely here that India’s civilizational play lies. If we can build a bridge between India’s civilizational moat and the RL training pipelines, this will accelerate AI timelines at a rate beyond an AI-maximalist’s imaginations.&lt;/p&gt;

&lt;h1 id=&quot;the-roadblocks-and-the-concerns&quot;&gt;The roadblocks and the concerns&lt;/h1&gt;

&lt;p&gt;The current orchestration mechanism for collecting the human feedback is fraught with other serious ethical shortcomings such as:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Labor exploitation:&lt;/strong&gt; An investigation by Time magazine revealed that [&lt;em&gt;OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic](https://time.com/6247678/openai-chatgpt-kenya-workers/).&lt;/em&gt;  This fits squarely in the larger pattern of abusive labor practices that was hitherto used for  content moderation jobs and are now being replicated under the banner of RLHF. Besides the monetary perspective, we also have to grapple with the psychological and emotional impact on the workers and labelers being exposed to toxic textual and visual content that they are expected to provide feedback about.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Scalability and quality:&lt;/strong&gt; It is rather self-evident that scaling computation by recruiting larger swathes of GPU-laden server farms is much easier compared to scaling up human labor and latency. Google cloud rents out the state-of-the-art A100 GPUs for as little as &lt;a href=&quot;https://cloud.google.com/blog/products/compute/a2-vms-with-nvidia-a100-gpus-are-ga/&quot;&gt;$0.87&lt;/a&gt; per hour! Human labor on the other hand is expensive, requires substantial managerial overhead, &lt;em&gt;slow&lt;/em&gt; (oft capped at 40hrs/week), error-ridden and is fraught with threat vectors such as &lt;a href=&quot;https://canvas.northwestern.edu/courses/65895/files/4174912/download?verifier=lj3RyqxMcHrB5PPB9P12Lxmnf6LWlPfYSH2SiHc3&amp;amp;wrap=1&quot;&gt;malicious/adversarial labelers&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Who’s ethics&lt;/strong&gt; are getting baked in during RLHF?  When &lt;em&gt;Time&lt;/em&gt;’s expose of using cheap Kenyan labor for RLHF was unveiled, the question of the responses having imbibed localized Kenyan ethics was raised in several social media platforms. While this insinuation is reductionist, the more nuanced notion that constructs such as toxicity, offensiveness and malignity are contextualized differently in different social-temporal contexts is extremely important to consider. It is but foolhardy to believe  that a model trained on a blackbox dataset scraped from the internet in a San Francisco lab and finetuned with the responses of exploited Kenyan human labor would be ethically robust in a complex setting such as India’s.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/whey_Untitled.png&quot; alt=&quot;Fig 11: ‘Kenyan ethics’ baked in?&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Fig 11: ‘Kenyan ethics’ baked in?&lt;/p&gt;

&lt;h1 id=&quot;the-devil-is-in-the-details-imagining-a-typology-of-implementation-paradigms&quot;&gt;The devil is in the details: Imagining a typology of implementation paradigms&lt;/h1&gt;

&lt;p&gt;Before we dive deep into the details of  specific implementation strategies, we’d like to set the stage by presenting a curated list of the main motivating factors at stake that will help contextualize the schema of solutions proposed.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;There’s an imminent risk of a dangerous schism between AI-augmented education sector in the west and the non-AI-augmented one in developing nations such as India that will have serious consequences to the already lopsided balance of power between the global north and the global south.&lt;/li&gt;
  &lt;li&gt;Programming  and hence, software engineering, is on the cusp of disruption. In many ways, the so termed software revolution of the 90s and 2000s gave India and Indian engineering talent a preeminent place on the global stage. Now, with software engineering standing on the cusp of disruption, it is imperative that we revamp our curricula strategically infusing LLMs via the schemes advocated below.&lt;/li&gt;
  &lt;li&gt;As stated above, large tracts of the BPO sector stands on the cusp of disruption as well. Menial labeling tasks are getting automated out to LLMs faster than one thinks.&lt;/li&gt;
  &lt;li&gt;On the bright side, there’s a once in a generation opportunity to create the newest and largest talent-pool of LLM knowhow and grab thought leadership in this domain by moving fast.&lt;/li&gt;
  &lt;li&gt;With the rise and rise of LLMs, uniquely human content creation will emerge as a premium entity. Allowing our students to intimately grapple with LLMs will provide them with the front row seat towards understanding the &lt;em&gt;qualia&lt;/em&gt; of creativity, thus potentially turning India into a bastion of creative content that will in turn provide us with tremendous leverage as a premium source of raw data to be fed into the next wave of LLMs.&lt;/li&gt;
  &lt;li&gt;We can harness the emergence of the LLM-threat to finally address the rote-learning tropes oft-associated with the Indian educational sector. The rising evidence of how ChatGPT-like systems ace entrance examinations, should hopefully provide us with the needed impetus towards overhauling our exam-oriented system and turning it into one where the grading is decided on the measurable contributions produced by the students in lieu of masterfully memorizing the study materials.&lt;/li&gt;
  &lt;li&gt;Subsidization: By establishing a virtuous loop between the LLM ecosystem  and universities, the revenue generated by the students’ throughputs can be harnessed towards subsidizing their education.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With these thoughts firmly in tow, we now shift our focus towards two flavors of intervention proposed: Passive and Active. The terms active and passive have been developed in a student-centric framework focusing on the primary texture of their engagement with the technology. By passive, we mean  those interventions where study-aid tools and curricula are developed &lt;em&gt;for&lt;/em&gt; the students and by active, we mean those interventions generated &lt;em&gt;by&lt;/em&gt; the students.&lt;/p&gt;

&lt;h2 id=&quot;a-passive-interventions&quot;&gt;a) Passive interventions:&lt;/h2&gt;

&lt;p&gt;One of the biggest shortcomings of India’s educational sector, especially at the K-12 levels has been the stark difference in instruction quality between the predominantly urban top-tier institutions and their rural counterparts. We’d insist that we are at an exciting juncture where with the right utilization of LLMs, we can &lt;em&gt;flatten the playing field&lt;/em&gt; for tens of million of kids, that will have incredible downstream consequences for the economy at large. The specific interventions we have envisioned to achieve this are:&lt;/p&gt;

&lt;h3 id=&quot;1-textbook2chatbot-short-form-videos-and-teaching-aids-flattening-the-playing-field-and-acad-tik-toks&quot;&gt;1) Textbook2Chatbot, Short-form videos and teaching-aids: Flattening the playing-field and acad-tik-toks:&lt;/h3&gt;

&lt;p&gt;The concept of office hours is almost non-existent in the Indian setup. In our own personal experience, we had to dig out answers from the deep trenches of youtube or class-notes posted by western university professors to attain a certain level of clarity and mastery on the syllabi we were grappling with. Much akin to us, students often have no support structure to aid them in their academic journeys once they leave the school premises which is why there exists such a high level of private-tutoring enrollment rates even in on-entrance-examination scenarios. Now with the &lt;em&gt;BYOD (Bring your own data)-Generative-AI&lt;/em&gt; revolution unfolding, we can turn textbooks into chatbots that the students anywhere can access, anytime and chat with anytime on their smartphones. We can also orchestrate this in such a way that the chatbot runs natively on the phone thereby further alleviating the concerns of asymmetrical implementation across areas with higher or lower levels of internet penetration. The other novel innovation we can unleash is the form of AI-generated short video content. Given that we are dealing with educating the TikTok/youtube-shorts generation with shortening attention spans, we can also condense and summarize the study materials into (sub)-minute-span  video chunks to build &lt;em&gt;academic-social-networks&lt;/em&gt; that can then leverage the network virality effect to popularize knowledge-flow across the student body.
As far as harnessing Generative-AI in classrooms is concerned, universities such as &lt;a href=&quot;https://www.npr.org/2023/01/26/1151499213/chatgpt-ai-education-cheating-classroom-wharton-school&quot;&gt;U-Penn&lt;/a&gt; and &lt;a href=&quot;https://www.rochester.edu/newscenter/chatgpt-artificial-intelligence-ai-chatbots-education-551522/&quot;&gt;University of Rochester&lt;/a&gt; have already stolen a march in the regard. Of particular note, is the paper: &lt;a href=&quot;https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4391243&quot;&gt;&lt;em&gt;Using AI to Implement Effective Teaching Strategies in Classrooms: Five Strategies, Including Prompts&lt;/em&gt;&lt;/a&gt; by Dr. Ethan Mollick and Dr. Lilach Mollick at the Wharton School of Business (University of Pennsylvania), whose five-pronged strategy is summarized as below:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Strategy 1: Using AI to Produce Many Varied Examples&lt;/li&gt;
  &lt;li&gt;Strategy 2: Using AI to Provide Multiple Explanations&lt;/li&gt;
  &lt;li&gt;Strategy 3: Using AI to Develop Low-Stakes Tests&lt;/li&gt;
  &lt;li&gt;Strategy 4: Using AI to Assess Student Learning&lt;/li&gt;
  &lt;li&gt;Strategy 5: Using AI to Distribute Practice of Important Ideas&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As exciting as these possibilities sound, we also need to ensure that LLMs are not integrated over-zealously en-masse across all the departments keeping in mind the recent &lt;a href=&quot;https://abc7news.com/vanderbilt-university-chatgpt-openai-chat-gpt/12859921/&quot;&gt;Vanderbilt&lt;/a&gt; fiasco where officials at Vanderbilt University  has to apologize to the student body for callously using ChatGPT to craft a consoling email addressing  the mass shooting at Michigan State University.&lt;/p&gt;

&lt;h3 id=&quot;2-curricula-development&quot;&gt;&lt;strong&gt;2) Curricula development:&lt;/strong&gt;&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;**The emergence of LLMs has had a transformative effect on on the very institution of computer programming and has birthed a whole never flavor of technology perched on new ideas. The simultaneous rise of low/no-code tools such as &lt;a href=&quot;http://buuble.io&quot;&gt;bubble.io&lt;/a&gt; has made it possible for anyone with an idea and some minimal technical aptitude to create their own app!  This has led to the birth of an entirely newly field that encapsulates a set of practices aimed at deploying and maintaining LLMs in production reliably and efficiently termed &lt;a href=&quot;https://github.com/tensorchord/awesome-open-source-llmops&quot;&gt;LLMOps&lt;/a&gt; (as an offshoot of MLOps). Similarly, we have also seen the emergence of an entirely new artform that deals with designing crafty prompt-inputs that deviate from &lt;em&gt;plain vanilla&lt;/em&gt; human language prompts with the goal of getting LLMs to generate desired outputs (See cheat sheet below).  This is called [&lt;/strong&gt;&lt;em&gt;Prompt engineering](https://www.cbsnews.com/news/ai-artificial-intelligence-chatgpt-jobs-prompt-engineer/).&lt;/em&gt;**&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/whey_Untitled%201.png&quot; alt=&quot;ChatGPT prompt cheat-sheet sourced from: [https://hasantoxr.gumroad.com/l/cc](https://hasantoxr.gumroad.com/l/cc)&quot; /&gt;&lt;/p&gt;

&lt;p&gt;ChatGPT prompt cheat-sheet sourced from: &lt;a href=&quot;https://hasantoxr.gumroad.com/l/cc&quot;&gt;https://hasantoxr.gumroad.com/l/cc&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have already witnessed the emergence of boot-camp styled coursework teaching the best practices associated with these newly emergent fields. For example, the &lt;a href=&quot;https://fullstackdeeplearning.com/&quot;&gt;&lt;em&gt;Full stack deep learning&lt;/em&gt;&lt;/a&gt; team has already begun conducting LLM-Bootcamps  (See &lt;a href=&quot;https://fullstackdeeplearning.com/llm-bootcamp/&quot;&gt;https://fullstackdeeplearning.com/llm-bootcamp/&lt;/a&gt;) that cost $950 and  tutors on Udemy have begun selling vast array of Prompt-engineering courses for $19.99 (Also see the &lt;em&gt;LearnPromptEngineering&lt;/em&gt; resource page here:   &lt;a href=&quot;https://learnprompting.org/docs/category/-applied-prompting&quot;&gt;https://learnprompting.org/docs/category/-applied-prompting&lt;/a&gt; ).  It is keeping in mind these rapid developments that we present a typology of curricular incorporating of these skill sets based on their temporal span:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;1 day/weekend: Bootcamp and certification course hyper-focused on specific topics such as &lt;a href=&quot;https://fullstackdeeplearning.com/llm-bootcamp/&quot;&gt;LLMOps&lt;/a&gt; , &lt;a href=&quot;https://www.udemy.com/course/promptengineering/&quot;&gt;Prompt Engineering&lt;/a&gt; and  &lt;a href=&quot;https://www.jailbreakchat.com/&quot;&gt;Jailbreak landscapes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;1 week: &lt;a href=&quot;https://www.youtube.com/watch?v=EJ-MG7c2Yqg&amp;amp;ab_channel=%23NoCodeAdvantage&quot;&gt;App&lt;/a&gt; and &lt;a href=&quot;https://www.youtube.com/watch?v=b_3qa0uYVb8&amp;amp;ab_channel=CodexCommunity&quot;&gt;website&lt;/a&gt; building with just LLMs and no-code tools for creative entrepreneurs with non-CS backgrounds&lt;/li&gt;
  &lt;li&gt;1 semester: Elective subjects offered as part of CS, ECE and allied undergraduate and graduate courses covering topics such as competing LLM architectures, RLHF etc. A more intensive version of this can also be turned into a PG Diploma / Certification course.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;1-2 year:  MA / MS in LLMs.&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;As motivated in the section on LLMOps, the exponential growth of advances in this field along with its idiosyncratic instrumentation  paradigm, has necessitated that we consider the possibility of offering a 1-1.5 year MS course specializing in LLMs. In this regard, we present a blueprint on what the course-contents could look like:&lt;/p&gt;

    &lt;ol&gt;
      &lt;li&gt;Technical capacity building module: This module prepares the students to confidently handle the mathematical rigor and the software engineering intricacies associated  with training, finetuning and deploying LLMs.
        &lt;ol&gt;
          &lt;li&gt;Deep learning building blocks: NLP fundamentals, Tokenization, Transformer architecture&lt;/li&gt;
          &lt;li&gt;Advances in distributed computing paradigms: GPU architecture, Specialized frameworks to train LLMs, Inference orchestration and model serving&lt;/li&gt;
          &lt;li&gt;Advances in foundation models (Along the lines of &lt;a href=&quot;https://stanford-cs324.github.io/winter2023/&quot;&gt;CS324&lt;/a&gt; developed at Stanford)&lt;/li&gt;
          &lt;li&gt;Reinforcement learning, alignment and RLHF&lt;/li&gt;
          &lt;li&gt;MLOps and LLM-ops&lt;/li&gt;
          &lt;li&gt;Prompt engineering&lt;/li&gt;
        &lt;/ol&gt;
      &lt;/li&gt;
      &lt;li&gt;Iconoclasm building module
        &lt;ol&gt;
          &lt;li&gt;Critical study of datasets used to train LLMs&lt;/li&gt;
          &lt;li&gt;Survey of biases baked into LLMs&lt;/li&gt;
          &lt;li&gt;Landscape of hallucinations exhibited by LLMs&lt;/li&gt;
          &lt;li&gt;History of tech, ELIZA effect and the Clever Hans phenomena&lt;/li&gt;
        &lt;/ol&gt;
      &lt;/li&gt;
      &lt;li&gt;Creativity module:
        &lt;ol&gt;
          &lt;li&gt;History of generative art&lt;/li&gt;
          &lt;li&gt;Philosophical underpinnings and qualia of human creativity&lt;/li&gt;
        &lt;/ol&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;Economics of LLMs&lt;/p&gt;

        &lt;p&gt;a. Cost of training these models&lt;/p&gt;

        &lt;p&gt;b. Carbon footprint of these models&lt;/p&gt;

        &lt;p&gt;c. Downstream repercussions on the labor sector&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;Live Project work / thesis / term-paper&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;c-emphasizing-the-whyness-of-the-non-technical-facets-of-the-course&quot;&gt;c) Emphasizing &lt;strong&gt;the whyness of the &lt;em&gt;non-technical&lt;/em&gt; facets of the course:&lt;/strong&gt;&lt;/h3&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;On nourishing LLM-iconoclasm amongst students:&lt;/strong&gt;
 An LLM is quintessentially a weighted directed graph adroit at conditional next-token prediction. The fact that these weights are fine-tuned on internet sized corpuses has resulted in a slew of emergent properties, the most dangerous of which is human-like fluency in sentence formation. This uncanny human-like fluency attainment, in turn lends some verisimilitude to human-like-intelligence, an apparition that needs to be addressed on a war footing in the academic syllabi. Given that the training dataset is typically opaque, we need to take results such as the model passing some marquee exam with a grain, or rather, a pint of salt. These ‘breakthroughs’ can very well be explained away as training-set memorization, which is not a massive breakthrough by any stretch of imagination. Associating anthropocentric terms such as &lt;em&gt;sentience, intelligence, morality&lt;/em&gt; and &lt;em&gt;emotionality&lt;/em&gt; makes no sense in the context of LLMs and these associations have serious downstream implications for the society at large. Hence, students need to be given a grand tour of the models’ fallibilities, perhaps even emphasizing on the more frivolous ones so that it remains anchored in their minds that they are ultimately grappling with gargantuan token-vomit-machines (TVMs) that are just another kind of a machine learning model and nothing else. This flavor of iconoclasm we argue will turn the students into better technologists, especially when combined with the technical nous garnered from the other modules.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;On the human creativity module:&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;We believe that providing students with a front row seat to the state-of-the-art AI-creativity advancements whilst also explaining to them the quintessential qualia of human creativity will not just make them skillful at using these Generative-AI-aided tools but will also allow them to clearly understand what makes human generated artform &lt;em&gt;tick&lt;/em&gt;. This is crucial as this can help groom a strong bastion of human creativity which will be crucial going forward in the future where mass-generated AI-creativity has flooded the zeitgeist. It is very plausible that in the coming years, &lt;em&gt;fresh&lt;/em&gt; human generated data will be hard to come by and future versions of GPT-x will be incestuously trained on data that its previous variant generated.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;b-active-interventions&quot;&gt;b) Active interventions&lt;/h2&gt;

&lt;p&gt;In this section, we present interventions where the student body actively participates in shaping, molding and contouring the LLMs to ensure that they are rendered more robust, safer and ultimately, useful for both, the society at large as well as for the incoming student-cohort. With this mind, we present an &lt;em&gt;HF lab framework&lt;/em&gt; that can be laid out in 6 phases.&lt;/p&gt;

&lt;h3 id=&quot;1-the-6-phase-framework&quot;&gt;1) The 6 phase framework&lt;/h3&gt;

&lt;ol&gt;
  &lt;li&gt;A multi-year MoU with an LLM institution: 
Right now, the LLM-land is littered with several competing startups all knowing very well that the secret sauce to break away from the rest of the pack is one whose recipe is out in the open: &lt;strong&gt;&lt;em&gt;Human feedback&lt;/em&gt;&lt;/strong&gt;!  In lieu of going to another data-collection startup with dodgy labor-rights record, we propose that we seek out formal partnerships with students as providers of human feedback. This will ensure that not only the generated data will be of far higher quality and more nuanced, but also that the students will be well compensated for their effort, either as academic tuition rebates or as salaries. In Phase-1, we can pick that LLM that offers the best terms of agreement using a cost function that can be a weighted average of the following 5 factors:
a) Revenue per student
b) Infrastructural investments
c) Post-graduation employment opportunities
d) Faculty-training and upskilling
e) Technology-sharing&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;As is ostensibly clear, scale clearly does matter and we argue that the signatory of such MoUs ought to be academic groups such as &lt;a href=&quot;https://in.linkedin.com/company/jaingroup&quot;&gt;Jain Group of Institutions&lt;/a&gt; (with 85 educational institutions with 51,600 students) or &lt;a href=&quot;https://en.wikipedia.org/wiki/Manipal_Academy_of_Higher_Education&quot;&gt;MAHE&lt;/a&gt; (with ~ 23700 students) and not standalone engineering colleges.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Infrastructure deployment
In this phase, we build out the infrastructure needed to orchestrate the human feedback sessions that has the following dimensions: Physical, Curricular and Software .
    &lt;ol&gt;
      &lt;li&gt;Physical: This can be a separate on-campus human feedback lab or a rented out space such as the Computer science labs or the library. We would also strongly insist that there be a physical ethics and mental wellbeing institute on-campus staffed with counselors who can help the students navigate through the emotional and philosophical questions that may arise through the process. Another facet of physical infrastructure would be the hardware that constitutes the terminals on which the students will be running and a secure high-bandwidth broadband connection to connect the lab to the LLM servers&lt;/li&gt;
      &lt;li&gt;Curricular: The students being initiated into the HF centers ought to be educated about the &lt;em&gt;whyness&lt;/em&gt; of the task they are being exposed to and what the expectations are, which leads us to the next fact: Software infrastructure&lt;/li&gt;
      &lt;li&gt;Software: The software infrastructure would cover components such as a culturally grounded graphical user interfaces with language translation technology baked in that will allow the students to seamlessly critique, rank and flag the outputs generated by the LLMs&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
  &lt;li&gt;Stress-testing and red-teaming phase with experts&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/whey_Untitled%202.png&quot; alt=&quot;Fig 12: Source: [https://arxiv.org/pdf/2210.07700.pdf](https://arxiv.org/pdf/2210.07700.pdf)&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Fig 12: Source: &lt;a href=&quot;https://arxiv.org/pdf/2210.07700.pdf&quot;&gt;https://arxiv.org/pdf/2210.07700.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Before exposing the students to the LLMs, we ought to make sure that there is a phase of red-teaming by on-campus experts that will stress-test the LLM-feedback-tech to ensure that the students are not exposed to any harm.  The template provided in a recent work [&lt;em&gt;Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey](https://arxiv.org/pdf/2210.07700.pdf)&lt;/em&gt; (as seen in Fig above) covers the landscape of harms that can be used to construct a laundry list of anticpiated harms and efforts undertaken to quell them. We argue that it is imperative that the LLM company with whom the MoU has been signed be actively involved in this crucial phase of deployment.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Introduction, limitation-awareness and orientation boot-camp: 
Before starting the feedback labs, the selected group of students need to be enrolled in a bootcamp-styled academic training where they are exposed to the thrill and challenges of the task ahead with enhanced focus on safety protocols, potential harm and nuances about the nature of the technology.&lt;/li&gt;
  &lt;li&gt;Pilot Project / 1 semester with PG:
Given that we are in unchartered territory, we strongly recommend that before scaling in cavalier manner, we test out the efficacy in one or two pilot projects spanning an entire academic semester with pre-established metrics and ensure that the proposed framework &lt;em&gt;works&lt;/em&gt;. During this pilot study, we recommend that counselors be present in the lab alongside the students at all times to ensure that the first signs of any event that can potentially cause trauma to any given student be caught and nipped in the bud.&lt;/li&gt;
  &lt;li&gt;Scaling across sister institutions
Once we ensure that the metrics being tracked during the pilot study are indeed met, we can scale it in a multi-phase manner across the sister institutions of the academic group. We implore that the administrators repeat the pilot study if strongly negative trends or traits emerge during the first pilot-study after sufficiently incorporating the rectifications applicable as per the malaise encountered.&lt;/li&gt;
&lt;/ol&gt;
</description>
        <pubDate>Sat, 31 Jan 2026 06:09:06 +0000</pubDate>
        <link>https://vinayprabhu.github.io/alignchronicles/ai/education/2026/01/31/gas_whey/</link>
        <guid isPermaLink="true">https://vinayprabhu.github.io/alignchronicles/ai/education/2026/01/31/gas_whey/</guid>
        
        <category>india</category>
        
        <category>reinforcement-learning</category>
        
        <category>data-moats</category>
        
        
        <category>ai</category>
        
        <category>education</category>
        
      </item>
    
      <item>
        <title>Mindset propagation probing by simulating social networks of LLMs</title>
        <description>&lt;h2 id=&quot;background-what-led-to-this&quot;&gt;Background: What led to this&lt;/h2&gt;
&lt;p&gt;Academically, I have led a journeyman’s life and my previous work &lt;a href=&quot;#ref1&quot;&gt;[1]&lt;/a&gt; has spanned network sciences, statistical physics, information theory, computational social science, machine learning, human kinematics, computer vision and data ethics. I have a dual PhD (ECE + CS) from Carnegie Mellon University where my doctoral thesis &lt;a href=&quot;#ref2&quot;&gt;[2]&lt;/a&gt; investigated the phenomenology of the &lt;em&gt;information contagion&lt;/em&gt; on Online Social Networks (OSNs) using tools from Graph Theory, Statistical Physics and Communication Theory.&lt;/p&gt;

&lt;p&gt;In my recent capacity as the CEO of HAL51 AI, I had to don the role of an &lt;em&gt;LLM-whisperer&lt;/em&gt; updating our guard-railing mechanisms every week to ensure our real-world deployed educational co-pilots &lt;a href=&quot;#ref3&quot;&gt;[3]&lt;/a&gt; would &lt;em&gt;stay on course&lt;/em&gt; in a sensitive setting such as a classroom. This has given me the proverbial ring side view on how young minds interact with AI-powered novel interfaces. This idea came to me in the midst of a noisy classroom in Fremont last year.&lt;/p&gt;

&lt;h2 id=&quot;the-core-idea-of-mindset-propagation-probing&quot;&gt;The core idea of mindset propagation probing&lt;/h2&gt;
&lt;p&gt;In my previous work on &lt;em&gt;“Latent Sentiment Detection in Online Social Networks: A Communications-oriented View”&lt;/em&gt; &lt;a href=&quot;#ref4&quot;&gt;[4]&lt;/a&gt;, I had investigated an exemplar manifestation of viral mindset propagation on social networks by modeling “&lt;em&gt;Hashtag-Hijacking&lt;/em&gt;” on Twitter using Markov Random Fields (MRFs). This resulted in a communications-theoretic framework for characterizing the probability of error of detecting the underlying latent sentiment that introduced a new factor: &lt;strong&gt;Network topology&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Recent advancements in large language modeling (LLM) have given us the three requisite ingredients:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Natural conversation generation:&lt;/strong&gt; Large language models with ability to simulate human-styled natural conversations.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Mindset absorption:&lt;/strong&gt; Fine-tuning methods that allow us to harvest the conversations and inflict internal weight changes.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Mechanistic probing:&lt;/strong&gt; A &lt;a href=&quot;https://arxiv.org/abs/2507.21509&quot;&gt;Persona Vectors Framework (PVF)&lt;/a&gt; &lt;a href=&quot;#ref5&quot;&gt;[5]&lt;/a&gt; that allows us to probe, measure, locate, assign and manipulate personas associated with these large language models.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To this end, I propose a rigorous empirical framework of modeling mindset propagation by simulating a social network of LLMs.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/llm_graph.png&quot; alt=&quot;Mindset propagation in social networks of LLMs&quot; /&gt;
&lt;em&gt;Figure 1: Mindset propagation in social networks of LLMs&lt;/em&gt;&lt;/p&gt;

&lt;h3 id=&quot;step-wise-rendition-of-the-underlying-framework&quot;&gt;Step-wise rendition of the underlying framework&lt;/h3&gt;
&lt;ol&gt;
  &lt;li&gt;We begin with a pre-fixed topology $G_{fixed}(V,E,W)$. We define Influencer nodes and Influencee nodes.&lt;/li&gt;
  &lt;li&gt;The mandate of the &lt;em&gt;influencer&lt;/em&gt; node is to propagandize and use persuasive conversations to alter the mindset of the &lt;em&gt;influencee&lt;/em&gt; nodes.&lt;/li&gt;
  &lt;li&gt;We facilitate &lt;em&gt;Contiones&lt;/em&gt; styled sessions spanning a few hundred conversations.&lt;/li&gt;
  &lt;li&gt;After every session, the influencee nodes go through a &lt;em&gt;reflection phase&lt;/em&gt; to fine-tune internal weights.&lt;/li&gt;
  &lt;li&gt;The sessions repeat and we track the temporal evolution of the mindstate of the influencee nodes.&lt;/li&gt;
&lt;/ol&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;references&quot;&gt;References&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a name=&quot;ref1&quot;&gt;&lt;/a&gt;[1] &lt;a href=&quot;https://scholar.google.com/citations?user=5Lck_J0AAAAJ&amp;amp;hl=en&quot;&gt;Google Scholar Profile&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a name=&quot;ref2&quot;&gt;&lt;/a&gt;[2] &lt;a href=&quot;https://kilthub.cmu.edu/articles/thesis/Network_Aided_Classification_and_Detection_of_Data/7430012?file=13756967&quot;&gt;Doctoral Thesis (CMU)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a name=&quot;ref3&quot;&gt;&lt;/a&gt;[3] &lt;a href=&quot;https://www.youtube.com/watch?v=J-ihwfPD3YA&quot;&gt;Educational Co-pilots Demo&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a name=&quot;ref4&quot;&gt;&lt;/a&gt;[4] &lt;a href=&quot;https://arxiv.org/pdf/1401.2113&quot;&gt;Latent Sentiment Detection paper&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a name=&quot;ref5&quot;&gt;&lt;/a&gt;[5] &lt;a href=&quot;https://arxiv.org/abs/2507.21509&quot;&gt;Persona Vectors Framework (PVF)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
        <pubDate>Sat, 31 Jan 2026 00:15:00 +0000</pubDate>
        <link>https://vinayprabhu.github.io/alignchronicles/research/llm-sociology/2026/01/31/mindset_prop/</link>
        <guid isPermaLink="true">https://vinayprabhu.github.io/alignchronicles/research/llm-sociology/2026/01/31/mindset_prop/</guid>
        
        <category>artificial-intelligence</category>
        
        <category>network-science</category>
        
        <category>llm</category>
        
        <category>persona-vectors</category>
        
        
        <category>research</category>
        
        <category>llm-sociology</category>
        
      </item>
    
      <item>
        <title>Matilda, Mars and Markup: The curious case of Mrs. Agnes Zevens</title>
        <description>&lt;p&gt;&lt;a href=&quot;https://people.idsia.ch/~juergen/deep-learning-history.html&quot;&gt;Jürgen Schmidhuber&lt;/a&gt; often wears his heart on his sleeve and does so rather publicly via his much read &lt;a href=&quot;https://people.idsia.ch/~juergen/&quot;&gt;personal website&lt;/a&gt;. Every now and then, his &lt;em&gt;look-I-did-it-in-the-90s&lt;/em&gt; blog posts make it to the front page of Hackernews and well, in this case, Sabine Hossenfelder’s youtube channel too (See “&lt;a href=&quot;https://youtu.be/PykNdM4v4Xo?si=qyR-M6BjsjcFzMCT&quot;&gt;Plagiarism Charges Against Nobel Prize for Artificial Intelligence&lt;/a&gt;”).
This particular episode on “&lt;a href=&quot;https://news.ycombinator.com/item?id=44941963&quot;&gt;Who invented Backpropagation&lt;/a&gt; (BP) ” (See original blog &lt;a href=&quot;https://people.idsia.ch/~juergen/who-invented-backpropagation.html#BPA&quot;&gt;here&lt;/a&gt;) led me to one of the papers I had long bookmarked but never actually perused: H. J. Kelley’s &lt;em&gt;“Gradient Theory of Optimal Flight Paths ”. ARS Journal, Vol. 30, No. 10, pp. 947-954, 1960.&lt;/em&gt;&lt;/p&gt;

&lt;h1 id=&quot;1-backdrop-of-backprop&quot;&gt;1. Backdrop of backprop&lt;/h1&gt;

&lt;p&gt;Backpropagation (BP) is oft quipped as the &lt;a href=&quot;https://radical.vc/geoffrey-hinton-on-the-algorithm-powering-modern-ai/&quot;&gt;backbone of modern AI&lt;/a&gt;. As so eloquently put in this &lt;a href=&quot;https://www.jmlr.org/papers/volume18/17-468/17-468.pdf#page=2.91&quot;&gt;auto-diff survey paper&lt;/a&gt;, BP converts the heady challenge of &lt;em&gt;learning&lt;/em&gt; into a gradient descent adventure in the neural network’s weight space eventually reaching a decentish minima of the objective function being targeted. The terminology emerges rather organically as the process literally entails &lt;em&gt;back&lt;/em&gt;ward &lt;em&gt;prop&lt;/em&gt;agation of the sensitivity of the objective value at the output (See figure below).&lt;/p&gt;

&lt;p&gt;Thanks for reading Twilight throne! Subscribe for free to receive new posts and support my work.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!lU_H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cbfa3c2-b1bb-40f3-b879-0f1cb0295627_1242x793.png&quot;&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/matilda-image-01.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Source: &lt;a href=&quot;https://arxiv.org/abs/1502.05767&quot;&gt;Automatic differentiation in machine learning: a survey&lt;/a&gt; by Baydin et al&lt;/p&gt;

&lt;p&gt;There’s been somewhat of a mild consensus in the by lanes of history of AI that Kelley’s paper was &lt;em&gt;the&lt;/em&gt; canonical work that brought BP into mainstream academic discourse and first presented the reasonably-recognizable precursor of modern backpropagation. (There’s an accompanying chapter on “&lt;a href=&quot;https://gwern.net/doc/ai/1962-kelley.pdf&quot;&gt;Method of Gradients&lt;/a&gt;” published two years later as well)&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!qTqU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9fd58a0-2dfe-4978-86a1-c39a5c3dc187_1410x400.png&quot;&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/matilda-image-02.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Acknowledgement sections of the 2 canonical papers&lt;/p&gt;

&lt;h1 id=&quot;2-marginalia-sub-authorship--matilda-effect&quot;&gt;2. Marginalia, “Sub-authorship” &amp;amp; Matilda effect&lt;/h1&gt;

&lt;p&gt;Fascinatingly, when I got to the end of the paper, the acknowledgement section caught my attention (See image above). It read:
”&lt;em&gt;Acknowledgments: The writer is pleased to acknowledge the contributions of Messrs. William P. O’Dwyer and H. Gardner Moyer of Grumman’s Computation Facility in handling the computational phase of this study on the IBM 704, and of &lt;strong&gt;Mrs. Agnes Zevens&lt;/strong&gt; of the Systems Research Section in &lt;strong&gt;checking and preparing the numerical results for publication&lt;/strong&gt;.&lt;/em&gt;”&lt;/p&gt;

&lt;p&gt;This hit me like a ton of bricks for a very specific reason. I had recently learnt of a sinister arrangement of “&lt;a href=&quot;https://journals.sagepub.com/doi/10.1177/053901847301200604&quot;&gt;sub-authorship&lt;/a&gt;” targeting women in science (and especially computing) where their contributions were &lt;a href=&quot;https://www.nature.com/articles/s41586-022-04966-w&quot;&gt;systematically&lt;/a&gt; &lt;a href=&quot;https://www.google.com/books/edition/Women_Scientists_in_America/jJr6ZfkDbE4C?hl=en&quot;&gt;relegated&lt;/a&gt; to the marginalia (typically the “acknowledgments section”) rather than being awarded co-authorship. This was textbook rank-and-file &lt;a href=&quot;https://journals.sagepub.com/doi/abs/10.1177/1075547012472684&quot;&gt;Matilda Effect&lt;/a&gt; in front of me which triggered the urge to reach out and learn more about this paper’s back story. I must mention that I was emboldened by my previous success in this regard. &lt;a href=&quot;https://x.com/vinayprabhu/status/1389316085074198531&quot;&gt;Back in 2021&lt;/a&gt;, I reached out to Prof. Gertraud Fenk-Oczlon about “&lt;em&gt;Konstanz im Kurzzeitgedächtnis - Konstanz im sprachlichen Informationsfluß?&lt;/em&gt;” published in 1980 which introduced the “&lt;a href=&quot;https://aclanthology.org/2021.emnlp-main.74/&quot;&gt;Uniform Information Density” hypothesis&lt;/a&gt; to NLP literature. As it turns out, she was never credited for this work and none of the NLP-adjacent courses I took as a PhD student at CMU mentioned her work.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!CLXF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59e2ef87-3490-479f-8e1d-4271675d200f_1468x554.png&quot;&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/matilda-image-03.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Landscape of samples of UID being cited in NLP papers&lt;/p&gt;

&lt;p&gt;As it turns out, one the professors who had appropriated her work had reached out in 2020 and personally apologized whilst acknowledging that her work ought to have been cited!&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!0lbz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50ae0726-2d38-44ad-b0f3-e492ad68706b_508x372.png&quot;&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/matilda-image-04.png&quot; alt=&quot;Image&quot; title=&quot;Image&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Correspondence with Gertraud Fenk-Oczlon: https://scholar.google.si/citations?user=dVMVtWAAAAAJ&amp;amp;hl=en&lt;/p&gt;

&lt;p&gt;So, as I began to search for Mrs. Agnes’ contact details, I chanced upon this &lt;a href=&quot;https://sortedbyname.com/letter_z/zevens.html&quot;&gt;page&lt;/a&gt; that read: “&lt;em&gt;ZEVENS, AGNES STARIN, also known as AGNES MICHALOWSKI, AGNES ZEVENS MICHALOWSKI and AGNES STARIN, was born 3 February 1920, is listed with the following 3 birthplaces: NEW YORK MAN, New York; N Y C, New York; NEW YORK CIT, New York, daughter of RICHARD STARIN (father) and HELEN THYM, was assigned Social Security number 104-03-9459 (indicating New York), and died 24 May 1999, while residing in Zip Code 11714-5910 (Bethpage, New York, within Nassau County).&lt;/em&gt;”
The dates did line up. The project that Mrs. Zevens was allocated to began roughly in 1958 funded by Organization of Aerospace research, USAF (See image below) and it was typical for senior women in their late thirties to be allocated to such important numerical jousts.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!12ut!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d5b8a17-9af3-4c77-8f79-de63379a8793_328x411.png&quot;&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/matilda-image-05.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The OAR USAF grant document sourced from Internet Archive: https://dn790002.ca.archive.org/0/items/DTIC_AD0600160/DTIC_AD0600160.pdf&lt;/p&gt;

&lt;p&gt;The locations too lined up. Grumman’s research facilities were located exactly in the hometown where she had resided: Bethpage, NY.
So, yeah, the code-reviewer-PM of the pioneering back-prop paper project had passed away in 1999 during the AI winter of the ‘90s without so much as a blimp. Am sure there are many such Mrs.Agneses out there buried in the Acknowledgement sections. If someone from Google scholar team ever reads this, here’s an utterly doable reasonable fix: Scrape the acknowledgement sections of all the papers you have archived in this era, extract the authors, run them through your SOTA name disambiguation pipelines and posthumously cite them. It is the least we can do! (Citation-puritans can go bleep themselves).&lt;/p&gt;

&lt;h1 id=&quot;3-last-mile-trivia&quot;&gt;3. Last mile trivia&lt;/h1&gt;

&lt;p&gt;A few trivia-hued concluding thoughts: The original MCP - Mars Colonization Project (MCP) in AI , Markup aesthetics of parenthesized citations and miniscule computing magic of IBM 704&lt;/p&gt;

&lt;h2 id=&quot;3a-the-original-mcp-mars-colonization-project&quot;&gt;3a: The original MCP: Mars Colonization Project&lt;/h2&gt;

&lt;p&gt;One of the most famous stories in AI tech-lore is one that unfolded in 2012 when Elon Musk gave Demis Hassabis a tour of SpaceX headquarters and described Mars as a “backup planet” in case something went wrong with humanity on Earth. Apparently, Hassabis responded by asking, “What if AI was the thing that went wrong?” and stated how easy it would be for the rogue-AI to follow humanity to Mars through our communication systems and destroy the colony there.
This apocryphal tale is also oft used to ground the metaphysical nexus between X.ai and SpaceX as also as a real-world Proof-of-Polymathism of the visionary behind these ambitious projects.
To me it’s both uncanny and yet weirdly reasonable that the pioneering work on the workhorse-algorithm of modern AI quite literally targeted finding the minimum time planar flight paths from Earth’s orbit to the orbit of Mars as the marquee numerical application to sell its efficacy.&lt;/p&gt;

&lt;h2 id=&quot;3b-markup-aesthetics-and-notational-dissonance&quot;&gt;3b: Markup aesthetics and Notational dissonance&lt;/h2&gt;

&lt;p&gt;One of the unexpected quirky challenges of reading Kelly’s paper was the formatting and the ensuing notational dissonance. As it turns out, certain journals in those days mandated usage of parentheses for citations and square brackets to denote equations. It is but reasonable to assume that familiar notation is processed more easily; deviations require extra cognitive effort and feel “wrong” or uncomfortable and it did!&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!lDVT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1beeb9ac-38c1-4dea-9b03-1890324509b9_815x1113.png&quot;&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/matilda-image-06.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Parantheses for citations?!&lt;/p&gt;

&lt;p&gt;Given the &lt;a href=&quot;https://x.com/karpathy/status/1980397031542989305&quot;&gt;recent&lt;/a&gt; pixel-fetish-brouhaha about modeling text using pictures of text, I wonder how such idiosyncrasies in the pre-training datasets will usher in corner cases that are currently not seen with textual data modeling with, err, text. (See this fascinating &lt;a href=&quot;https://youtu.be/XkPdoEMPJwU?si=b4JHpsmKyG-tNzl9&quot;&gt;video&lt;/a&gt; from the recent &lt;a href=&quot;https://tokenization-workshop.github.io/schedule/&quot;&gt;tokenization workshop&lt;/a&gt; below).&lt;/p&gt;

&lt;p&gt;For now, likes of Gemini seem to be navigating this ambiguity just fine!&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!g6ek!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bb378eb-dddf-4b54-ac17-870f55f76263_1536x1508.png&quot;&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/matilda-image-07.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;3c-the-miniscule-computing-giant-that-was-ibm-704&quot;&gt;3c: The miniscule computing giant that was IBM 704!&lt;/h2&gt;

&lt;p&gt;These days, it is a norm to flaunt the vulgarity of GPU silicon real estate you were able to throw at the model slop being peddled in the AI/LLLM papers. The flaunt in Kelly’s paper was the &lt;a href=&quot;https://www.columbia.edu/cu/computinghistory/704.html&quot;&gt;IBM 704&lt;/a&gt; whose stats just blew my mind. I am going to write down my CliffsNotes in a pointwise manner and will probably visit and revisit them every now and then just to relive the early days of the conquest of the comp-core monoculture that conquered humanity (and to also remind myself to never label myself GPU-poor)&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Introduced in 1954, 704 was the first mass-produced computer with hardware for floating-point arithmetic and regarded as “&lt;a href=&quot;https://www.theregister.com/2015/02/26/my_aunt_was_a_human_assembler_at_nasa/&quot;&gt;pretty much the only computer that could handle complex math&lt;/a&gt;”.&lt;/li&gt;
  &lt;li&gt;Was a vacuum-tube machine and failed around every 8 hours&lt;/li&gt;
  &lt;li&gt;Peaked out at 12000 floating-point additions per second&lt;/li&gt;
  &lt;li&gt;Ran FORTRAN and LISP&lt;/li&gt;
  &lt;li&gt;One 38-bit accumulator, one 36-bit multiplier/quotient register, and three 15-bit index registers. That’s it!&lt;/li&gt;
&lt;/ol&gt;

&lt;hr /&gt;

</description>
        <pubDate>Sun, 26 Oct 2025 06:09:06 +0000</pubDate>
        <link>https://vinayprabhu.github.io/alignchronicles/ai-history/backpropagation/research/2025/10/26/matilda-mars-and-markup-the-curious-case-of-mrs-agnes-zevens/</link>
        <guid isPermaLink="true">https://vinayprabhu.github.io/alignchronicles/ai-history/backpropagation/research/2025/10/26/matilda-mars-and-markup-the-curious-case-of-mrs-agnes-zevens/</guid>
        
        <category>artificial-intelligence</category>
        
        <category>history</category>
        
        <category>matilda effect</category>
        
        
        <category>ai-history</category>
        
        <category>backpropagation</category>
        
        <category>research</category>
        
      </item>
    
      <item>
        <title>A study of “A Study of Face Obfuscation in ImageNet”</title>
        <description>&lt;hr /&gt;

&lt;h3 id=&quot;a-study-of-a-study-of-face-obfuscation-in-imagenet&quot;&gt;A study of “A Study of Face Obfuscation in ImageNet”&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Table of Contents:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;— — — — — — — — — — — — — — — — — — — — — — — -&lt;/p&gt;

&lt;p&gt;· Background&lt;br /&gt;
· Main Issues&lt;br /&gt;
∘ Issue-1: The curious case of Face obfuscation&lt;br /&gt;
∘ Issue-2: NSFW analysis&lt;br /&gt;
∘ Issue-3: Human co-occurrence analysis&lt;br /&gt;
∘ 👹FAQs of the ‘Devil’s advocacy’ kind: Our humble tribute to the cult of “Both-sideism”&lt;br /&gt;
∘ 👹1: Why not just contact them in lieu of public grandstanding? &lt;br /&gt;
∘ 👹2: May be your little emails slipped through the cracks perhaps?&lt;br /&gt;
∘ 👹 3: Well, OK. But PL is not the sole author. How do you know all the co-authors and the collaborators in the acknowledgment were even aware of your work?!&lt;br /&gt;
∘ 👹 4: In the Wired interview published on March 15th, when pressed by the reporter, one of the authors states that “a citation will appear in an updated version of the paper”. Doesn’t that solve the problem?&lt;br /&gt;
· Concluding thoughts: The real issues&lt;br /&gt;
∘ a) Erasure of black-women-scholarship:&lt;br /&gt;
∘ b) Revisiting the horrors of ghost labor:&lt;br /&gt;
— — — — — — — — — — — — — — — — — — — — — — — -&lt;/p&gt;

&lt;h3 id=&quot;background&quot;&gt;Background&lt;/h3&gt;

&lt;p&gt;On June 24, 2020, Abeba Birhane and I released on our paper “&lt;a href=&quot;https://arxiv.org/pdf/2006.16923.pdf&quot;&gt;Large image datasets: A pyrrhic win for computer vision?&lt;/a&gt;” critiquing the culture of large scale datasets in Computer Vision. In the paper, we performed &lt;em&gt;a cross-sectional model-based quantitative census covering factors such as age, gender, NSFW content scoring, class-wise accuracy, human-cardinality-analysis, and the semanticity of the image class information in order to statistically investigate the extent and subtleties of ethical transgressions&lt;/em&gt; using the &lt;a href=&quot;http://image-net.org/download&quot;&gt;&lt;strong&gt;ImageNet dataset&lt;/strong&gt;&lt;/a&gt; as a template &lt;em&gt;.&lt;/em&gt; The nature and the expanse of the transgressions attracted quite some media attention (See &lt;a href=&quot;https://www.theregister.com/2020/07/01/mit_dataset_removed/&quot;&gt;this&lt;/a&gt;, &lt;a href=&quot;https://venturebeat.com/2020/07/01/mit-takes-down-80-million-tiny-images-data-set-due-to-racist-and-offensive-content/&quot;&gt;this&lt;/a&gt; and &lt;a href=&quot;https://venturebeat.com/2020/07/15/announcing-the-ai-innovation-awards-winners-at-transform-2020/&quot;&gt;this&lt;/a&gt;). In Section 2.3 of our paper, we revisited the downstream effects of “ &lt;em&gt;The WordNet&lt;/em&gt; &lt;em&gt;Effect” (&lt;/em&gt; that results from inheriting labels from the WordNet taxonomy ) and showed how this affects not just the ImageNet dataset but also other datasets such as the the &lt;a href=&quot;https://people.csail.mit.edu/torralba/publications/80millionImages.pdf&quot;&gt;Tiny Images dataset&lt;/a&gt; and the latest &lt;a href=&quot;https://github.com/vinayprabhu/Crimes_of_Vision_Datasets/blob/master/Notebooks/Notebook_4_ml_images_unsafe.ipynb&quot;&gt;Tencent-ML-Images dataset&lt;/a&gt; that either directly or indirectly inherited the label-space from WordNet. On June 29th 2020, we learnt that the curators of the Tiny Images dataset had &lt;a href=&quot;https://groups.csail.mit.edu/vision/TinyImages/&quot;&gt;apologized and withdrawn the dataset&lt;/a&gt;. &lt;br /&gt;
In Jan 2021, the paper was formally presented at the IEEE/CVF Winter Conference on Applications of Computer Vision (&lt;a href=&quot;https://openaccess.thecvf.com/content/WACV2021/html/Birhane_Large_Image_Datasets_A_Pyrrhic_Win_for_Computer_Vision_WACV_2021_paper.html&quot;&gt;WACV -2021&lt;/a&gt;) and has been cited in more than two dozen papers since.&lt;br /&gt;
In the backdrop of all of this work, this Wednesday, on the 10th of March-2021, we encountered a paper titled &lt;a href=&quot;https://arxiv.org/pdf/2103.06191.pdf&quot;&gt;&lt;em&gt;A Study of Face Obfuscation in ImageNet&lt;/em&gt;&lt;/a&gt; from the ImageNet curators that has left us disappointed and flummoxed. By indulging in &lt;em&gt;what appears&lt;/em&gt; to be a calculated and systematic erasure of the entire body of critique that our work was a part of, the authors have sent out a wide range of wrong signals. This erasure is doubly disappointing given how the community had recently rallied behind the main visionary of the ImageNet project when her contributions towards the “ &lt;em&gt;AI revolution”&lt;/em&gt; were being erased in an online compendium and the sheer clout she enjoins in the field.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/a-study-of-a-study-of-face-obfuscation-in-imagenet-img-1.png&quot; alt=&quot;&quot; /&gt;Left: Version of Brief History of Deep Learning from 1943–2019 [Timeline] on Apr 23, 2020. Right: Version today after the community uproar driven by Dr. Gebru’s&lt;a href=&quot;https://twitter.com/timnitGebru/status/1252752743942328321?s=20&quot;&gt; tweet&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Below, we bemoan this unfortunate departure from norms of academic integrity by carefully disentangling the specific details &lt;em&gt;that characterize the situation from our standpoint&lt;/em&gt;. In doing so, we are sharing the exact snapshots of the conversation(s) that unraveled between the parties involved here. &lt;br /&gt;
&lt;strong&gt;Pre-script&lt;/strong&gt; : The authors of the paper &lt;a href=&quot;https://arxiv.org/abs/2006.16923&quot;&gt;Large image datasets: A pyrrhic win for computer vision?&lt;/a&gt;, (and this blog-post you are reading), are abbreviated as VP (Vinay Prabhu) and AB (Abeba Birhane) respectively in the rest of the material presented here. PL refers to the &lt;a href=&quot;https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/&quot;&gt;main visionary&lt;/a&gt; of the ImageNet dataset. Wherever relevant, ‘ &lt;em&gt;Our paper&lt;/em&gt; ’ refers to &lt;a href=&quot;https://arxiv.org/abs/2006.16923&quot;&gt;&lt;em&gt;Large image datasets: A pyrrhic win for computer vision?&lt;/em&gt;&lt;/a&gt;__ and ‘ &lt;em&gt;their paper&lt;/em&gt; ’ refers to &lt;em&gt;_&lt;a href=&quot;https://arxiv.org/abs/2103.06191&quot;&gt;_A Study of Face Obfuscation in ImageNet_&lt;/a&gt; _.&lt;/em&gt;&lt;/p&gt;

&lt;h3 id=&quot;main-issues&quot;&gt;Main Issues&lt;/h3&gt;

&lt;h4 id=&quot;issue-1-the-curious-case-of-face-obfuscation&quot;&gt;&lt;strong&gt;Issue-1: The curious case of &lt;em&gt;Face obfuscation&lt;/em&gt;&lt;/strong&gt;&lt;/h4&gt;

&lt;p&gt;&lt;em&gt;Background&lt;/em&gt; : Section 4: &lt;em&gt;Candidate solutions: The path ahead&lt;/em&gt; in our paper, was perhaps, the most difficult for us to author. We knew we were submitting to WACV (on the stubborn insistence of VP) , an unlikely venue for ‘ &lt;em&gt;fairness papers&lt;/em&gt; ’ whose submission portal did not even have a primary or secondary topic for “ &lt;em&gt;Explainable AI, fairness, accountability, privacy, and ethics in vision&lt;/em&gt; ”, which did receive a mention in the CFP however. (See screenshot below).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/a-study-of-a-study-of-face-obfuscation-in-imagenet-img-2.png&quot; alt=&quot;&quot; /&gt;Screenshot of the email VP wrote to WACV organizers. There was no reply received to this email.&lt;/p&gt;

&lt;p&gt;Our goal, simply put, was to in fact, &lt;strong&gt;&lt;em&gt;engage directly with the practitioners in the field&lt;/em&gt;&lt;/strong&gt; and not just the ethics community. And, in many ways, it did culminate in a rather “ &lt;em&gt;lively discussion&lt;/em&gt; ” when we did present the paper at WACV in a session chaired by Jordi Pont-Tuset, a Research Scientist @ Google Zürich.&lt;br /&gt;
At this juncture, we’d like to share that AB, along with a lot of our other colleagues and pre-reviewers, rightfully questioned the very need for the section as they felt it reeked of tech-solutionism. Nonetheless, predicting the clamor for ‘possible solutions’ from the reviewers of this traditional Computer Vision conference (which was eventually proven to be a correct assumption), the section persisted. In this regard, we’d like to draw the attention of the reader towards Section 4.3 in our paper which is literally titled “** &lt;em&gt;Differentially private obfuscation of the faces”&lt;/em&gt;** where we state: ” &lt;em&gt;This path entails harnessing techniques such as DP-Blur [36] with quantifiable privacy guarantees to obfuscate the identity of the humans in the image. The Inclusive images challenge [94], for example, already incorporated blurring during dataset curation and addressed the downstream effects surrounding change in predictive power of the models trained on the blurred versions of the dataset curated. We believe that replication of this template that also clearly included avenues for recourse in case of an erroneously non-blurred image being sighted by a researcher will be a step in the right direction for the community at large&lt;/em&gt; ”. As evinced by the papers we cited, privacy preserving obfuscation of images is &lt;strong&gt;&lt;em&gt;neither a novel idea and most certainly not our idea&lt;/em&gt;&lt;/strong&gt;. But, in the specific context of imagining a face-obfuscated version of ImageNet, it is reasonable to assume that any one who will author a paper audaciously titled “ &lt;em&gt;A Study of Face Obfuscation in ImageNet&lt;/em&gt; ” will pay at least a lip-service towards citing either our work and/or [94] in our paper which is:&lt;br /&gt;
&lt;em&gt;[94] Shreya Shankar, Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, and D Sculley. No classification without representation: Assessing geodiversity issues in open data sets for the developing world. arXiv preprint arXiv:1711.08536, 2017.&lt;/em&gt;&lt;br /&gt;
But the authors chose not to cite this one either. Their paper begins with “ &lt;em&gt;Image obfuscation (blurring, mosaicing, etc.) is widely used for privacy protection. However, computer vision research often overlooks privacy by assuming access to original unobfuscated images” (Like, really?!&lt;/em&gt;&lt;a href=&quot;https://emojipedia.org/face-with-rolling-eyes/&quot;&gt;🙄&lt;/a&gt; &lt;em&gt;)&lt;/em&gt; and goes on to claim that they have discovered that “.. &lt;em&gt;the dataset exposes many people co-occurring with other objects in images, e.g., people sitting on chairs, walking their dogs, or drinking beer (Fig. 1). It is concerning since ILSVRC is publicly available and widely used.”&lt;/em&gt;&lt;a href=&quot;https://emojipedia.org/woozy-face/&quot;&gt;🥴&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/a-study-of-a-study-of-face-obfuscation-in-imagenet-img-3.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Also, we’d like to ask the readers to take 2 minutes to parse these FAQs from the associated Kaggle contest ([94] in our paper) from &amp;gt; 2 years ago and then read &lt;em&gt;their paper&lt;/em&gt; again 🤐&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/a-study-of-a-study-of-face-obfuscation-in-imagenet-img-4.png&quot; alt=&quot;&quot; /&gt;Source: &lt;a href=&quot;https://www.kaggle.com/c/inclusive-images-challenge/overview/inclusive-images-faq#recognizable-faces&quot;&gt;https://www.kaggle.com/c/inclusive-images-challenge/overview/inclusive-images-faq#recognizable-faces&lt;/a&gt;&lt;/p&gt;

&lt;h4 id=&quot;issue-2-nsfw-analysis&quot;&gt;&lt;strong&gt;Issue-2: NSFW analysis&lt;/strong&gt;&lt;/h4&gt;

&lt;p&gt;The term NSFW appears 38 times in our paper and we not only curated a class-wise meta-dataset (df_nsfw.csv | Size: (1000, 5) ) consisting of the mean and std of the NSFW scores of the train and validation images arranged per-class but also dedicated Appendix B.2 towards “ &lt;em&gt;NSFW scoring aided misogynistic imagery hand-labeling”.&lt;/em&gt; In Table-5, we specifically focus on classes 445, 638,639, 655 and 459 mapping to bikini, two-piece , maillot ,&lt;br /&gt;
miniskirt and brassiere/ bra/ bandeau in the dataset that we found were NSFW-dense classes.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/a-study-of-a-study-of-face-obfuscation-in-imagenet-img-5.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Again, much to our disappointment, the authors claim to have discovered that  &lt;em&gt;: ”The number of NSFW areas varies significantly across different ILSVRC categories. Bikini is likely to contain much more NSFW areas than the average&lt;/em&gt;.” &lt;a href=&quot;https://emojipedia.org/unamused-face/&quot;&gt;😒&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/a-study-of-a-study-of-face-obfuscation-in-imagenet-img-6.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h4 id=&quot;issue-3-human-co-occurrence-analysis&quot;&gt;&lt;strong&gt;Issue-3: Human co-occurrence analysis&lt;/strong&gt;&lt;/h4&gt;

&lt;p&gt;In our paper, we dedicated Section &lt;em&gt;B.3 Dogs to musical instruments: Co-occurrence based gender biases&lt;/em&gt; towards human co-occurrence-biases, specifically with regards to classes involving dog-breed-class images and musical instruments that have high density of incidentally co-occurring humans. Their new paper states: “ &lt;em&gt;Results suggests that super categories such as clothing and musical instrument frequently co-occur with people&lt;/em&gt; ”🤦&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/a-study-of-a-study-of-face-obfuscation-in-imagenet-img-7.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h4 id=&quot;faqs-of-the-devils-advocacy-kind-our-humble-tribute-to-the-cult-of-both-sideism&quot;&gt;👹FAQs of the ‘Devil’s advocacy’ kind: Our humble tribute to the cult of “Both-sideism”&lt;/h4&gt;

&lt;p&gt;Given the attention this might elicit we pre-emptively anticipate the exact flavor of attacks and cover the following “ &lt;em&gt;Devil’s advocacy counter-points&lt;/em&gt; ” in the section below:&lt;/p&gt;

&lt;h4 id=&quot;-1-why-not-just-contact-them-in-lieu-of-public-grandstanding-have-you-bothered-to-even-contact-the-curators-of-the-imagenet-dataset&quot;&gt;👹** &lt;em&gt;1: Why not just contact them in lieu of public grandstanding? Have you bothered to even contact the curators of the ImageNet dataset?&lt;/em&gt;**&lt;/h4&gt;

&lt;p&gt;Yes! Glad you asked. Here are the screenshots of our emails dating all the way back to Aug 19th 2019 and later, on Apr 12, 2020 to which we received no replies whatsoever:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/a-study-of-a-study-of-face-obfuscation-in-imagenet-img-8.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Understanding the magnitude of the impact and being wary of any possible &lt;a href=&quot;https://en.wikipedia.org/wiki/Streisand_effect&quot;&gt;Streissand effect&lt;/a&gt;, we spent the entirety of the near &lt;strong&gt;10 month period&lt;/strong&gt; between Aug 2019 and Jun 2020 in various outreach efforts amongst many journalists, Computer Vision and Ethics communities and organizations. This also involved VP working with journalists such as &lt;a href=&quot;https://twitter.com/katyanna_q?lang=en&quot;&gt;Katyanna Quach&lt;/a&gt; at &lt;em&gt;The Register&lt;/em&gt; who then authored this article: &lt;a href=&quot;https://www.theregister.com/2019/10/23/ai_dataset_imagenet_consent/&quot;&gt;Inside the 1TB ImageNet data set used to train the world’s AI: Naked kids, drunken frat parties, porno stars, and more&lt;/a&gt;&lt;/p&gt;

&lt;h4 id=&quot;-2-oh-come-on-stop-with-the-self-aggrandizing-and-self-loathing-ai-royalty-tend-to-receive-hundreds-of-emails-a-day-may-be-your-little-emails-slipped-through-the-cracks-perhaps&quot;&gt;** &lt;em&gt;👹2: Oh come on! Stop with the self-aggrandizing and self-loathing&lt;/em&gt;. &lt;em&gt;AI royalty tend to receive hundreds of emails a day. May be your little emails slipped through the cracks perhaps?&lt;/em&gt;**&lt;/h4&gt;

&lt;p&gt;Again, glad you asked! &lt;br /&gt;
&lt;strong&gt;Lemma-1: PL was &lt;em&gt;extremely&lt;/em&gt; well aware of the work.&lt;/strong&gt;&lt;br /&gt;
&lt;strong&gt;Proof&lt;/strong&gt; : The paper that we published heavily draws from my talk “&lt;a href=&quot;https://hai.stanford.edu/events/hai-weekly-seminar-vinay-uday-prabhu-four-horsemen-ethical-malice-peer-reviewed-machine&quot;&gt;On the four horsemen of ethical malice in peer reviewed machine learning literature&lt;/a&gt;” given under the aegis of the Stanford-HAI weekly seminars (thanks to an invite from &lt;a href=&quot;https://scholar.google.com/citations?user=S09IVcYAAAAJ&amp;amp;hl=en&quot;&gt;Colin Kelley Garvey&lt;/a&gt;, an AI ethicist) on April 17, 2020–11:00am–12:00pm. On Apr 15th, I received this email from a HAI co-ordinator stating that “ &lt;em&gt;I just spoke with HAI Co-Director, Fei-Fei Li, and she would like to come on screen after you finish your talk and ask you a few before Colin gives you questions from the audience. Please let me know if you are comfortable with this request&lt;/em&gt; ”.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/a-study-of-a-study-of-face-obfuscation-in-imagenet-img-9.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This was followed by the first ever communication I received voluntarily from PL whose screen-shot is below.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/a-study-of-a-study-of-face-obfuscation-in-imagenet-img-10.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This was followed by a delayed reply on April 17th, that read …&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/a-study-of-a-study-of-face-obfuscation-in-imagenet-img-11.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h4 id=&quot;and-lastly-here-is-the-actual-video-of-our-zoom-face-to-face-meeting--httpsyoutubehpa67idxngu&quot;&gt;And lastly, here is the actual video of our zoom-face-to-face meeting &lt;a href=&quot;https://emojipedia.org/video-camera/&quot;&gt;📹&lt;/a&gt; &lt;a href=&quot;https://youtu.be/hpA67iDxNGU&quot;&gt;https://youtu.be/hpA67iDxNGU&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Q.E.D&lt;/strong&gt; !&lt;br /&gt;
&lt;strong&gt;&lt;em&gt;👹 3: Well, OK. But PL is not the sole author. How do you know all the co-authors and the collaborators in the acknowledgment were even aware of your work?!&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Because they have literally cited us just recently&lt;a href=&quot;https://qwerty.dev/interrobang/&quot;&gt;‽&lt;/a&gt; In their paper titled “ &lt;em&gt;REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets&lt;/em&gt; ”, the authors contextualize our work by citing “ &lt;em&gt;Recent work [51] has looked at dataset issues related to consent and justice, and motivate enforcing Institutional Review Boards (IRB) approval for large scale datasets&lt;/em&gt;.” A reductionist take on our work, but a proof-of-awareness nonetheless!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/a-study-of-a-study-of-face-obfuscation-in-imagenet-img-12.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h4 id=&quot;-4-in-the-wired-interviewhttpswwwwiredcomstoryresearchers-blur-faces-launched-thousand-algorithms-published-on-march-15th-when-pressed-by-the-reporter-one-of-the-authors-states-that-a-citation-will-appear-in-an-updated-version-of-the-paper-doesnt-that-solve-the-problem&quot;&gt;👹 &lt;strong&gt;&lt;em&gt;4: In the&lt;/em&gt;&lt;/strong&gt;[** &lt;em&gt;Wired interview&lt;/em&gt;&lt;strong&gt;](https://www.wired.com/story/researchers-blur-faces-launched-thousand-algorithms/)&lt;/strong&gt; &lt;em&gt;published on March 15th, when pressed by the reporter, one of the authors states that “a citation will appear in an updated version of the paper”. Doesn’t that solve the problem?&lt;/em&gt;**&lt;/h4&gt;

&lt;p&gt;Again. This blog is not about citation-seeking. We’d like to clearly point out that the biggest shortcomings are the tactical abdication of responsibility for all the mess in ImageNet combined with systematic erasure of related critical work, that might well have led to these corrective measures being taken.&lt;br /&gt;
The authors tactically left out an entire body of literature that critiqued the ImageNet beginning with the &lt;a href=&quot;https://arxiv.org/abs/1905.01347&quot;&gt;ImageNet audits&lt;/a&gt; by Chris Dulhanty and Alexander Wong (and not to mention Chris’ entire &lt;a href=&quot;https://uwspace.uwaterloo.ca/handle/10012/16414&quot;&gt;thesis&lt;/a&gt;) and more recent data-archeological expeditions such as &lt;a href=&quot;https://logicmag.io/commons/lines-of-sight/&quot;&gt;Lines of Sight&lt;/a&gt; by Alex Hanna et al. This shouldn’t come as a surprise to anybody because their last inquisition into the &lt;em&gt;Person subtree&lt;/em&gt; ( where they admitted that of the 2832 people categories that are annotated within the subtree, 1593 of them were potentially offensive labels and only 158 of them were visual), they made &lt;strong&gt;no mention of the hugely influential ImageNet Roulette project&lt;/strong&gt; ( that went viral on September 19, 2019 while the paper only hit the ArXiv servers on 16 Dec 2019!). Also, lest we forget that these &lt;em&gt;solutions&lt;/em&gt; are being ushered in a good 12 years after the dataset release. T-W-E-L-V-E YEARS!&lt;/p&gt;

&lt;h3 id=&quot;concluding-thoughts-the-real-issues&quot;&gt;Concluding thoughts: The real issues&lt;/h3&gt;

&lt;h4 id=&quot;a-erasure-of-black-women-scholarship&quot;&gt;&lt;strong&gt;a) Erasure of black-women-scholarship:&lt;/strong&gt;&lt;/h4&gt;

&lt;p&gt;AB’s central role in turning a rag-tag set of empirical results and revelations into a cogent peer-review-worthy publication and later investing all the efforts to champion it’s cause via talks, interviews and presentations is one of the main reasons why the paper is even being cited now. The primacy of her contributions is also reflected in the official citation that literally reads:&lt;br /&gt;
_@inproceedings{birhane2021large,&lt;br /&gt;
title={Large Image Datasets: A Pyrrhic Win for Computer Vision?},&lt;br /&gt;
author={Birhane, Abeba and Prabhu, Vinay Uday},&lt;br /&gt;
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},&lt;br /&gt;
pages={1537–1547},&lt;br /&gt;
year={2021}&lt;br /&gt;
}&lt;br /&gt;
_But, unfortunately for her, undervaluing of her scholarship is not an aberration but a trend. Black women’s intellectual production has historically been ignored and systemically erased. The hierarchical academic structure that devalues Black women’s intellectual contributions makes contesting such injustice a tiresome endeavor discouraging Black women scholars from coming forward. Black feminist theory scholars such as Jennifer Nash, have extensively explored the &lt;a href=&quot;https://www.bcheights.com/2019/11/17/nash-calls-for-stewardship-in-black-feminist-citation/&quot;&gt;Citational Desires&lt;/a&gt; of scholars whose contributions have been systematically under-emphasized. Initiatives such as the Cite Black Women collective (&lt;a href=&quot;https://www.citeblackwomencollective.org/&quot;&gt;https://www.citeblackwomencollective.org/&lt;/a&gt;) work towards dismantling precisely this behavior in academia and it is unfortunate to see this behavior reinforced by highly esteemed scholars who are supposed to be the torchbearers of hope.&lt;/p&gt;

&lt;h4 id=&quot;b-revisiting-the-horrors-of-ghost-labor&quot;&gt;&lt;strong&gt;b) Revisiting the horrors of ghost labor:&lt;/strong&gt;&lt;/h4&gt;

&lt;p&gt;During our draft revisions, specifically Section-4, AB and I were in the midst of a ‘ &lt;em&gt;How do we go about fixing this morass?&lt;/em&gt; ’ conversation, when we realized two things: In order to truly &lt;em&gt;clean up&lt;/em&gt; the dataset, we’d be forced to make two massive compromises:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Resort to using the unethical “SoTA” tools from companies like Amazon, Face++ or Clarifai to perform face detection and filter the problematic images&lt;/li&gt;
  &lt;li&gt;Resort to exploiting the ghost labor markets of AMT to hand-annotate the NSFW facet of the dataset.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As it turns out, on the very same day that the &lt;a href=&quot;https://turkopticon.ucsd.edu/&quot;&gt;Turkopticon&lt;/a&gt; fundraising campaign was announced, a few hours later, we see the efforts of this paper falling prey to both the ills. In fact, the gamified HIT (Human&lt;br /&gt;
Intelligence Task) details reads 🤢: &lt;em&gt;These images have verified ground truth faces, but we intentionally show incorrect annotations for the workers to fix. The entire HIT resembles an action game. Starting with 2 lives, the worker&lt;br /&gt;
will lose a life when making a mistake on gold standard images. In that case, they will see the ground truth faces (Fig. B Right) and the remaining lives. If they lose both 2 lives, the game is over, and they have to start from scratch at&lt;br /&gt;
the first image. We found this strategy to effectively retain workers’ attention and improve annotation quality.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/a-study-of-a-study-of-face-obfuscation-in-imagenet-img-13.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;(Also see &lt;a href=&quot;https://www.vice.com/en/article/88apnv/underpaid-workers-are-being-forced-to-train-biased-ai-on-mechanical-turk&quot;&gt;https://www.vice.com/en/article/88apnv/underpaid-workers-are-being-forced-to-train-biased-ai-on-mechanical-turk&lt;/a&gt; )&lt;/p&gt;

&lt;p&gt;To conclude, we say:&lt;br /&gt;
- This is NOT us desperately hoping to drum up some antics to garner more attention&lt;br /&gt;
- This is NOT us trying to eke out one more citation &lt;br /&gt;
- This is NOT us assuming the proverbial higher pedestal and judging anyone&lt;br /&gt;
- This is NOT an ad hominem attack on any member of the ImageNet team. &lt;br /&gt;
- This IS us calling out a &lt;strong&gt;&lt;em&gt;pattern of citation erasure&lt;/em&gt;&lt;/strong&gt; (with specific verifiable proofs) and highlighting the ethical shortcomings in a paper that will probably be extremely well cited in the near future and much worse, celebrated (wrongly IMHO) as a template for stop-gap fixes.&lt;br /&gt;
We call upon the curators of the dataset to pay heed to the issues raised and take corrective measures.&lt;/p&gt;

&lt;p&gt;Kindest regards,&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Abeba Birhane and Vinay Prabhu&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PS: If all of this is confusing, here is the VERIFIABLE timeline of events to summarize what happened.&lt;br /&gt;
&lt;em&gt;1: 19 Aug 2019 — Contacted ImageNet curators via email. No response.&lt;br /&gt;
2: Sep 2019: Chat with _&lt;a href=&quot;https://www.theregister.com/Author/Katyanna-Quach/&quot;&gt;_Katyanna Quach_&lt;/a&gt; _at ‘The Register’ in order to research the specific details regarding ImageNet for an impending article.&lt;br /&gt;
3: 23 Oct 2019: Register article comes out: _&lt;a href=&quot;https://www.theregister.com/2019/10/23/ai_dataset_imagenet_consent/&quot;&gt;_https://www.theregister.com/2019/10/23/ai_dataset_imagenet_consent/_&lt;/a&gt;&lt;/em&gt;&lt;br /&gt;
4: 12 Apr 2020: Second email contact with the ImageNet curators via email. No response.&lt;br /&gt;
5: 15 Apr 2020: PL contacts me via email&lt;br /&gt;
6: Apr 25, 2020 : Talk at Stanford that PL attends titled “&lt;em&gt;&lt;a href=&quot;https://hai.stanford.edu/events/hai-weekly-seminar-vinay-uday-prabhu-four-horsemen-ethical-malice-peer-reviewed-machine&quot;&gt; _Ethical Malice in Peer-Reviewed Machine Learning Literature_&lt;/a&gt; _” (Video link included)&lt;br /&gt;
7: June 2020, The first version of our paper appears on ArXiv : _&lt;a href=&quot;https://arxiv.org/abs/2006.16923&quot;&gt;_https://arxiv.org/abs/2006.16923_&lt;/a&gt; _&lt;br /&gt;
8: March 2021, PL et al publish “A Study of Face Obfuscation in ImageNet” sans any citation or acknowledgement&lt;/em&gt;&lt;/p&gt;
</description>
        <pubDate>Mon, 15 Mar 2021 21:32:31 +0000</pubDate>
        <link>https://vinayprabhu.github.io/alignchronicles/research/computer-vision/2021/03/15/a-study-of-a-study-of-face-obfuscation-in-imagenet/</link>
        <guid isPermaLink="true">https://vinayprabhu.github.io/alignchronicles/research/computer-vision/2021/03/15/a-study-of-a-study-of-face-obfuscation-in-imagenet/</guid>
        
        
        <category>research</category>
        
        <category>computer-vision</category>
        
      </item>
    
      <item>
        <title>Scrutinizing Saliency Based Image Cropping</title>
        <description>&lt;hr /&gt;

&lt;h3 id=&quot;scrutinizing-saliency-based-image-cropping&quot;&gt;Scrutinizing Saliency Based Image Cropping&lt;/h3&gt;

&lt;p&gt;Last week, the saliency-based image&lt;a href=&quot;https://www.theguardian.com/technology/2020/sep/21/twitter-apologises-for-racist-image-cropping-algorithm&quot;&gt; cropping algorithm&lt;/a&gt; deployed by twitter came into&lt;a href=&quot;https://www.theguardian.com/technology/2020/sep/21/twitter-apologises-for-racist-image-cropping-algorithm&quot;&gt; scrutiny.&lt;/a&gt; Inspired by some of the conversations that unraveled on Twitter and the widely shared reported incidents of racial discrimination, we sought to investigate, experiment, and elucidate the workings of cropping algorithms. Following up from &lt;a href=&quot;https://medium.com/@VinayPrabhu/on-the-twitter-cropping-controversy-critique-clarifications-and-comments-7ac66154f687&quot;&gt;last week&lt;/a&gt;, here are the updates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Democratizing the audit&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In order to democratize the scrutiny of this technology, we have created an educational saliency based cropping&lt;a href=&quot;https://gradio.app/demo/saliency&quot;&gt; app&lt;/a&gt; where you can upload images and see what a &lt;em&gt;state-of-the-art&lt;/em&gt; machine learning model &lt;em&gt;similar&lt;/em&gt; to the one deployed by twitter &lt;em&gt;thinks&lt;/em&gt; are important parts of the image and see how that results in what parts of the image are cropped out. (Please note that, the exact model and the cropping policy used by twitter are both, to the best of our knowledge, proprietary and beyond &lt;em&gt;easy&lt;/em&gt; access. Therefore, our reconstruction is limited to what is available in peer-reviewed open sourced academic literature). We have also added an interactive &lt;em&gt;TOAST UI image editor&lt;/em&gt; that one can use to further explore the brittleness of this technology.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/scrutinizing-saliency-based-image-cropping-img-1.png&quot; alt=&quot;&quot; /&gt;The gradio user interface&lt;img src=&quot;/alignchronicles/assets/images/posts/scrutinizing-saliency-based-image-cropping-img-2.png&quot; alt=&quot;&quot; /&gt;The inbuilt image editor to conduct the experiments&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On saliency based cropping&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Saliency based cropping is not unique to Twitter. This very same technology is also used by other tech firms including&lt;a href=&quot;https://twitter.com/AnimaAnandkumar/status/1308096236159893505?s=20&quot;&gt; Google&lt;/a&gt;, &lt;a href=&quot;https://patents.google.com/patent/US9626584B2/en&quot;&gt;Adobe&lt;/a&gt;, and&lt;a href=&quot;https://developer.apple.com/documentation/vision/cropping_images_using_saliency&quot;&gt; Apple&lt;/a&gt;. This technique, which twitte&lt;a href=&quot;https://blog.twitter.com/engineering/en_us/topics/infrastructure/2018/Smart-Auto-Cropping-of-Images.html&quot;&gt;r admittedly uses&lt;/a&gt; on it’s platform, typically entails two phases: The saliency mask estimation phase and the cropping phase.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;In the first phase, a &lt;em&gt;saliency mask&lt;/em&gt; is estimated using a machine learning model that ingests an input image and speculates which parts of the image are interesting and/or important (retain-worthy) and which parts of the image are &lt;em&gt;discardable&lt;/em&gt; (or crop-worthy). These machine learning models are typically trained on datasets such as&lt;a href=&quot;http://salicon.net/&quot;&gt; SALICON&lt;/a&gt;,&lt;a href=&quot;http://saliency.mit.edu/datasets.html&quot;&gt; MIT-1003 and CAT2000&lt;/a&gt; with attention-annotated “ &lt;em&gt;ground truth&lt;/em&gt; ” saliency maps collected by either using volunteers or crowd-sourcing exercises.&lt;/li&gt;
  &lt;li&gt;In the second phase, the saliency map output in the first phase is then used to come up with a cropping policy that results in a cropped image with the so-perceived non-salient parts of the image being removed and the so-perceived salient parts of the image being retained.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As it turns out, this cropping process is a double edged sword. As it is evident in &lt;a href=&quot;https://gradio.app/demo/saliency&quot;&gt;these example images&lt;/a&gt;, even the cropped image &lt;em&gt;seems fair&lt;/em&gt; , the cropping has in fact, masked the differential saliency that the machine learning model associates with the different constituent faces in the image and some of these nuanced facets of biased ugliness are obfuscated in the finally rendered image.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On the saliency model we used for the gradio app&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Given that both twitter’s saliency-estimation model and the cropping policy are not in the public domain, we used a &lt;em&gt;similar&lt;/em&gt; model from peer-reviewed machine learning literature that emulates twitter’s cropping algorithm. We looked for a SoTA model that was open-sourced. We used the &lt;a href=&quot;https://github.com/alexanderkroner/saliency&quot;&gt;MSI-Net&lt;/a&gt; model which ranked high on the &lt;a href=&quot;https://saliency.tuebingen.ai/results.html&quot;&gt;MIT/Tuebingen Saliency Benchmark&lt;/a&gt;. The associated paper is &lt;a href=&quot;https://www.sciencedirect.com/science/article/pii/S0893608020301660&quot;&gt;Contextual Encoder–Decoder Network for Visual Saliency Prediction&lt;/a&gt; by Kroner et al. Since this model only maps an input image to saliency map, and doesn’t perform any cropping, we authored a cropping function which is &lt;em&gt;a sliding window with a fixed aspect ratio (16,9) that maximizes sum of saliency&lt;/em&gt;. Our code is open-sourced, and you can find everything required to build this interface &lt;a href=&quot;https://github.com/gradio-app/saliency&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Participation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The gradio&lt;a href=&quot;https://gradio.app/demo/saliency&quot;&gt; saliency based image cropping &lt;/a&gt;app is open for anyone to interact and experiment with. Upload an image and simply click the &lt;strong&gt;submit&lt;/strong&gt; button, which will show you a heatmap of features that the algorithm picks up as “ &lt;em&gt;important&lt;/em&gt; ”. &lt;strong&gt;&lt;em&gt;We do not save or store your images.&lt;/em&gt;&lt;/strong&gt;&lt;br /&gt;
If you come across an unusual, discriminatory, or biased saliency distribution that you’d like for us to pay heed to or include in a forthcoming academic dissemination, please let us know by dropping it &lt;a href=&quot;https://www.dropbox.com/request/gFaju50BlFyiGHnCLJ08&quot;&gt;here&lt;/a&gt;. (However, please make sure that the images that you are uploading are consensually sourced and adhere to &lt;a href=&quot;https://creativecommons.org/about/cclicenses/&quot;&gt;CC-BY regulations&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;Team:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/vinayprabhu&quot;&gt;Vinay Prabhu&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/Abebab&quot;&gt;Abeba Birhane&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://@si3luwa&quot;&gt;Ali Abdalla&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/IDoTheThinking&quot;&gt;Darrell Owens&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
        <pubDate>Fri, 02 Oct 2020 18:01:34 +0000</pubDate>
        <link>https://vinayprabhu.github.io/alignchronicles/research/computer-vision/2020/10/02/scrutinizing-saliency-based-image-cropping/</link>
        <guid isPermaLink="true">https://vinayprabhu.github.io/alignchronicles/research/computer-vision/2020/10/02/scrutinizing-saliency-based-image-cropping/</guid>
        
        
        <category>research</category>
        
        <category>computer-vision</category>
        
      </item>
    
      <item>
        <title>On the twitter cropping controversy: Critique, clarifications and comments</title>
        <description>&lt;hr /&gt;

&lt;h3 id=&quot;on-the-twitter-cropping-controversy-critique-clarifications--comments--1&quot;&gt;On the twitter cropping controversy: Critique, clarifications &amp;amp; comments- 1&lt;/h3&gt;

&lt;p&gt;TL;DR: &lt;em&gt;Fcuck the algorithm&lt;/em&gt;&lt;br /&gt;
Table of Contents:&lt;br /&gt;
- Experiment details &lt;br /&gt;
- Feedback from twitter: The two camps of the Rashomon aisle &lt;br /&gt;
- Unbiased algorithmic saliency cropping is a pipedream &lt;br /&gt;
- Conclusion&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;a href=&quot;https://twitter.com/vinayprabhu/status/1307460502017028096?s=20&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This Saturday noon, I was made aware of an ongoing saliency based cropping bias fracas on my &lt;a href=&quot;https://twitter.com/Abebab/status/1307430893418688512?s=20&quot;&gt;timeline&lt;/a&gt;. As I was literally rushing through the final &lt;a href=&quot;https://github.com/vinayprabhu/CFD_MLImages&quot;&gt;experiments&lt;/a&gt; for a forthcoming paper that seeks to address Computer Vision’s growing fascination with physiognomy, I thought I’d run this little statistical experiment live on twitter with the exact same &lt;a href=&quot;https://chicagofaces.org/default/&quot;&gt;Psychonomic dataset&lt;/a&gt; I was currently using to investigate this.&lt;/p&gt;

&lt;h3 id=&quot;experiment-details&quot;&gt;Experiment details:&lt;/h3&gt;

&lt;p&gt;The&lt;a href=&quot;https://chicagofaces.org/default/download/&quot;&gt; CFD dataset&lt;/a&gt; contains 93 images of consensually collected &lt;em&gt;self-identified&lt;/em&gt; Black (B) and White (W) faces controlled for &lt;em&gt;saturation, size, resolution, lighting conditions, facial expressions, clothing and face-stimuli (neutral)&lt;/em&gt;. (Please read Pg.1125 in the &lt;a href=&quot;https://www.wittenbrink.org/cfd/mcw2015.pdf&quot;&gt;paper&lt;/a&gt; for the details). I generated a 3x1 grid of Black and white faces with an all black separator image in the middle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Why you ask?&lt;/em&gt;&lt;/strong&gt; Because, it was literally the format of the viral image that I saw on my timeline that had cropped out the black person (See Fig-1 below for the exact screenshot and the code)&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*NJUkvhLW6hvQ3tgfh4TZJA.png&quot; alt=&quot;&quot; /&gt;Figure 1: The tweet that motivated the format of the experiment&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Results:&lt;br /&gt;
** After the automated tweets began appearing on my timeline, I screenshotted the images, made a collage, used the &lt;a href=&quot;https://www.makesense.ai/&quot;&gt;makesense&lt;/a&gt; annotation tool and manually conduct the post-cropping census. Of the 93 image pairs, 92 went through (Automated tweeting has always been a stochastic lossy endeavor to me. If you have ideas to fix this, do share them!), and of the 92 images, I saw a 40:52 White:Black split. Here’s the link for the &lt;a href=&quot;https://github.com/vinayprabhu/CFD_MLImages/blob/master/Data/cfd_twitter_all.png&quot;&gt;collage&lt;/a&gt; and the &lt;a href=&quot;https://github.com/vinayprabhu/CFD_MLImages/blob/master/Data/cfd_twitter_all.csv&quot;&gt;annotation file&lt;/a&gt;.&lt;/strong&gt;&lt;br /&gt;
FAQs:**&lt;br /&gt;
&lt;em&gt;1: Err, is this how natural images occur in the real world?!&lt;br /&gt;
A: _Nope. There’s a group from MIT whose research I truly admire that contacted me that is doing this sort of an experiment as I type this. That said, I do have issues with the so-called _in the wild&lt;/em&gt; image datasets, on account of the fact that most of these are collected sans consent. &lt;a href=&quot;https://twitter.com/Abebab&quot;&gt;Abeba Birhane&lt;/a&gt; and I literally published a &lt;a href=&quot;https://arxiv.org/pdf/2006.16923.pdf&quot;&gt;25 page paper&lt;/a&gt; covering many of the ethical shortcomings.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*eI1FezjMR7aPi0hUGcTbcw.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;2: &lt;em&gt;Why not choose all the 4278 (93 choose 2) pairs for the experiment?&lt;/em&gt;&lt;br /&gt;
A: I am doing that as stated in my tweet. I literally created an account &lt;a href=&quot;https://twitter.com/cropping_bias&quot;&gt;@cropping_bias&lt;/a&gt; to do this in an exhaustive manner. I am waiting on the twitter dev team to approve my developer account access. The last update I have is on Monday, 21 Sep, 2020 5:15 AM where they are asking me for further clarifications. &lt;br /&gt;
I have performed many experiments with the twitter API before and API access grant has been pretty swift. I think they are more thoroughly checking with a human-in-the-loop and as of today, I don’t think they are hurriedly updating their algorithm and stalling the key access&lt;br /&gt;
 &lt;em&gt;3: So these initial results demonstrate no racial bias, correct?&lt;/em&gt;&lt;br /&gt;
A: No no no no. &lt;strong&gt;&lt;em&gt;Racism is&lt;/em&gt;&lt;/strong&gt; __&lt;strong&gt;&lt;em&gt;experiential. Not statistical.&lt;/em&gt;&lt;/strong&gt; If you are banking on statistics to establish that racism exists, you need a soul searching and a half.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/0*FF4kGEJfZfdW_RnO&quot; alt=&quot;&quot; /&gt;Figure 2: Racism is experiential. What statistic can one possibly assign to this plaque that renders it racist?!&lt;/p&gt;

&lt;h3 id=&quot;feedback-from-twitter-the-two-camps-of-the-rashomon-aisle&quot;&gt;&lt;strong&gt;Feedback from twitter: The two camps of the Rashomon aisle&lt;/strong&gt;&lt;/h3&gt;

&lt;p&gt;A few hours after I posted the tweet, I had observed a rather overwhelming response to the tweet, including a series of DMs containing strong reactions, appropriations and critique that left me somewhat befuddled but not entirely.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Before any further elaboration, I’d like to explicitly lose the “Model Right Activists” crowd: &lt;strong&gt;Fcuk the algorithm&lt;/strong&gt;!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Okay, that stated, let me draw your attention, firstly, to the &lt;em&gt;Rashomon effect&lt;/em&gt; I observed in the responses received. The exact same 40:52 W:B ratio was interpreted in completely different ways and I’ve had a front row seat to the emergence of the two camps on my timeline.&lt;/p&gt;

&lt;p&gt;C&lt;strong&gt;amp-A&lt;/strong&gt; : The erasure rate of a black person’s face is 43.4%! &lt;em&gt;Why is this tech even framing it as a binary classification gatekeeping problem&lt;/em&gt;?! This experiment has serious shortcomings too. &lt;em&gt;This is why all tech is a cesspit right now&lt;/em&gt;!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you can’t tell already, I am squarely, a denizen of this Camp-A (&lt;/strong&gt;&lt;em&gt;Okay. There’s goes the centrist crowd&lt;/em&gt;&lt;strong&gt;)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;C&lt;strong&gt;amp-B&lt;/strong&gt; : Voila! 56.4% chance of acceptance! Therefore NO RACIAL VIAS! (&lt;em&gt;Psst … Many camp-B members, I suspect, silently believe it’s in fact “reverse racism” at work but won’t say it out aloud.&lt;/em&gt;)&lt;/p&gt;

&lt;p&gt;Now, on the basis of &lt;em&gt;this specific experiment&lt;/em&gt; and &lt;em&gt;MY&lt;/em&gt; value system and beliefs, it is demonstrably clear that Twitter’s image cropping framework &lt;strong&gt;needs a complete overhaul with some proper soul searching thrown in for good measure.&lt;/strong&gt; Whether it is on the basis of fashionably FACCT* approved statistical metrics (See &lt;a href=&quot;https://en.wikipedia.org/wiki/Goodhart%27s_law&quot;&gt;Goodhart’s law&lt;/a&gt;) or not is irrelevant to me and it should be to you too.&lt;/p&gt;

&lt;h3 id=&quot;unbiased-algorithmic-saliency-cropping-is-a-pipedream&quot;&gt;&lt;em&gt;Unbiased algorithmic saliency cropping is a pipedream&lt;/em&gt;&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Unbiased algorithmic saliency cropping is a pipedream, and an ill-posed one at that. The very way in which the cropping problem is framed, it’s fate is sealed and there is no woke “unbiased” algorithm implemented downstream that could fix it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;With the following arguments, I’d like to motivate why the ansatz is &lt;em&gt;reasonable.&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;The Whyness of cropping:&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Why even ‘smart crop’? The first question that came to my mind is: &lt;em&gt;What does “smart cropping” achieve that mere ‘dumb’ downsizing does not&lt;/em&gt;? To the detractors playing devil’s advocate and stating that the entire ‘true’ image is just a click away anyway, I ask: &lt;em&gt;Can’t the same logic be used to just deploy “dumb” downsizing of the image in lieu of a fancy neural network&lt;/em&gt;? If the timeline reader is indeed intrigued by the context of the tweet and the image, won’t they just click on the blurry downsized image? Either way, “it’s just one click away”. No?&lt;br /&gt;
Also, was there really an exodus of users from twitter on account of blurry downsized images that prompted the creation and deployment of this technology that comes with it’s own non trivial carbon footprint and engineering overhead?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. All metrics are wrong. Some are useful, albeit within a utility shelf-life.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Twitter’s dissemination &lt;a href=&quot;https://blog.twitter.com/engineering/en_us/topics/infrastructure/2018/Smart-Auto-Cropping-of-Images.html&quot;&gt;here&lt;/a&gt; states verbatim that &lt;em&gt;Photos in your timeline are cropped to improve consistency and to allow you to see more Tweets at a glance&lt;/em&gt;?Err, what exactly is “consistency” and why is maximizing # of tweets/ glance a worthy metric to pursue? Is consistency the same as area under the curve (AUC), normalized scan-path saliency (NSS), and similarity (SIM) metrics? Does hitting high numbers on CAT2000 and MIT1003 MIT saliency benchmarks guarantee good user experience and user wellness? (&lt;em&gt;Psst… it clearly is not in case you missed the memo&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The ethics of saliency cropping&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At the heart of this whole field of saliency driven image cropping lies this audacious set of borderline meta-physical beliefs:&lt;/p&gt;

&lt;p&gt;a) That there is a single mysterious universal notion of ground truth saliency distribution associated with an image that transcends the viewer’s lived experiences across the idiosyncratic space-time chronotypes we inhabit.&lt;/p&gt;

&lt;p&gt;b) This universally valid saliency map can be accurately algorithmized by training a critter of the deep differentiable model menagerie, “DeepGaze” or something else that breaches the so called “state of the art”.&lt;/p&gt;

&lt;p&gt;In fact, I wonder if this technology causes more harm by deployment of an &lt;em&gt;under-performing&lt;/em&gt; model or a &lt;em&gt;well-performing&lt;/em&gt; model (in the puritanical Machine Learning sense that is). In fact, doing eye-tracking data driven algorithmic saliency cropping might actually end up faithfully reproducing and promulgating correlative human foibles like the &lt;em&gt;male gaze&lt;/em&gt; in the collected dataset that’ll only alienate the self identified female users of the platform. Am unsure if that’s what Twitter wants.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/0*Dq4Q1z4Km8KRvKV9&quot; alt=&quot;&quot; /&gt;Source: &lt;a href=&quot;https://www.tn2magazine.ie/feminist-film-series-the-male-gaze/&quot;&gt;https://www.tn2magazine.ie/feminist-film-series-the-male-gaze/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4: The irrelevance of model rights activism (MRA) and the falsified sanctity of the 50:50 parity: Schadenfreude much?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I have personally found a fair bit of appropriating the 40:52 result as a testament that the AI is not to be blamed and there’s demonstrably no bias. To me, it is precisely irrelevant if the neural network weights here are not where the blame lies or if the dataset was unbiased within your value framework. While the tools of cold rationalism oblivious to the human condition and historicities goad us to believe that a 50:50 parity is sanctimonious, I strongly question that. Heck, even the Senate cloture rule literally requires a 60:40 ratio to end a debate and move to a vote. 51 is too ostentatious is it?&lt;br /&gt;
Is there a universal cut off ratio above which &lt;em&gt;fairness&lt;/em&gt; is dramatically ushered in? All we have are putatively reasonable cutoffs, and it doesn’t take a SJW to realize that these cutoffs are put in place in an ad hoc fashion, usually by the ones in power.&lt;em&gt;Lest we forget that statisticians are still grappling with the 0.05 p-value devil put in place by a eugenicist patriarch&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;To conclude, as with anything on twitter, the response was noisy, instantaneous and of course, hard to quickly summarize. Lack of sound participatory design principles have rendered Twitter timelines, IMHO, to amplify both the &lt;em&gt;Baader-Meinhof selective attention bias&lt;/em&gt; and &lt;em&gt;denominator blindness&lt;/em&gt;. Momentum of the narrative far trumps empathetic thinking, and I oft find myself culpable for that as well.&lt;/p&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;Here’s a bunch of parting salvos that I’d like to sign off with.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Yes! I noticed visceral schadenfreude by the ‘No racism’ MRA crowd. To those from this crowd reading this essay, here’s my &lt;a href=&quot;https://twitter.com/vinayprabhu&quot;&gt;twitter link&lt;/a&gt; so that it’s easier to unfollow and block me :)&lt;/li&gt;
  &lt;li&gt;Yes! I am and you should remain suspicious of any &amp;amp; all claims of CLEAN UNBIASED anthropogenic datasets&lt;/li&gt;
  &lt;li&gt;Yes! Large tracts of Behavioral research &amp;amp; psychonomics are seriously troublesome&lt;/li&gt;
  &lt;li&gt;Yes! CFD does have MANY serious shortcomings. I am addressing these in a forthcoming paper (link to be shared here)&lt;/li&gt;
  &lt;li&gt;Yes! The &lt;em&gt;neutral white&lt;/em&gt; background is NOT ‘realistic’&lt;/li&gt;
  &lt;li&gt;Yes! It is a red herring fallacy to suggest that this study somehow indicates I believe in binarized race classification or similarly garbage ideas like binarized gender.&lt;/li&gt;
  &lt;li&gt;Nope. This study was not meant to assign nor assigned a clean chit to nobody’s algos&lt;/li&gt;
  &lt;li&gt;Nope. Black &amp;amp; brown faces are not ‘corner cases’ and that type of phraseology is mighty troublesome.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Have a nice thoughtful week ahead, and remember:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fcuk the algorithm.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Always!&lt;/strong&gt;&lt;/p&gt;

&lt;h4 id=&quot;addition-the-second-installment-of-this-blog-series-can-be-found-here-httpsmediumcomvinayprabhuscrutinizing-saliency-based-image-cropping-6b7a70cfb4f1&quot;&gt;Addition: The second installment of this blog series can be found here: &lt;a href=&quot;https://medium.com/@VinayPrabhu/scrutinizing-saliency-based-image-cropping-6b7a70cfb4f1&quot;&gt;https://medium.com/@VinayPrabhu/scrutinizing-saliency-based-image-cropping-6b7a70cfb4f1&lt;/a&gt;&lt;/h4&gt;
</description>
        <pubDate>Mon, 21 Sep 2020 17:10:31 +0000</pubDate>
        <link>https://vinayprabhu.github.io/alignchronicles/research/computer-vision/2020/09/21/on-the-twitter-cropping-controversy-critique-clarifications-and-comments/</link>
        <guid isPermaLink="true">https://vinayprabhu.github.io/alignchronicles/research/computer-vision/2020/09/21/on-the-twitter-cropping-controversy-critique-clarifications-and-comments/</guid>
        
        
        <category>research</category>
        
        <category>computer-vision</category>
        
      </item>
    
      <item>
        <title>Disentangling disentanglement in Deep Learning</title>
        <description>&lt;hr /&gt;

&lt;h3 id=&quot;disentangling-disentanglement-footnotes-from-neurips--2019&quot;&gt;Disentangling disentanglement: Footnotes from NEURIPS — 2019&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*E5aAW6C8ljRXoSjVY4g3Jg.jpeg&quot; alt=&quot;&quot; /&gt;Credit: ‘ Striving for Disentanglement’ by Simon Greig — &lt;a href=&quot;https://www.flickr.com/photos/xrrr/500039281/&quot;&gt;https://www.flickr.com/photos/xrrr/500039281/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;TL;DR : &lt;em&gt;Disentangling disentanglement. Via this blog-post, I intend to try and summarize all of the dozen papers presented on disentanglement in deep learning in this year’s NEURIPS-2019 Vancouver.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Companion Github repo replete with paper summaries and cheat sheets: &lt;a href=&quot;https://github.com/vinayprabhu/Disentanglement_NEURIPS_2019/tree/master/Figures&quot;&gt;https://github.com/vinayprabhu/Disentanglement_NEURIPS_2019/&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;background-disentanglement-in-representation-learning&quot;&gt;Background: Disentanglement in Representation learning&lt;/h3&gt;

&lt;p&gt;On Thursday evening of the conference week, as I sauntered around the poster session in the massive east exhibition halls of the Vancouver convention center, I realized that I had chanced upon probably the 5th poster in the past couple of days entailing analysis of a disentanglement framework the authors had worked on.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*JMsL0AeY1Yt2d_h9yfvfmA.png&quot; alt=&quot;&quot; /&gt;Fig 1: (Yet another) Poster on disentanglement at this year’s NEURIPS&lt;/p&gt;

&lt;p&gt;A quick check in the &lt;a href=&quot;https://papers.nips.cc/book/advances-in-neural-information-processing-systems-32-2019&quot;&gt;proceedings&lt;/a&gt; led me to this stunning statistic: A total of I-KID-YOU-NOT &lt;strong&gt;dozen&lt;/strong&gt; papers were accepted this year with the term ‘&lt;strong&gt;DISENTANGLEMENT&lt;/strong&gt; ’ in the title. There were at least a few more that I chanced upon in the multitude of workshops. (There were 20+ papers and talks during the 2017 NEURIPS workshop on Learning Disentangled Representations: from Perception to Control — &lt;a href=&quot;https://sites.google.com/view/disentanglenips2017&quot;&gt;https://sites.google.com/view/disentanglenips2017&lt;/a&gt; and we had a challenge workshop this year as well: &lt;a href=&quot;https://www.aicrowd.com/challenges/neurips-2019-disentanglement-challenge&quot;&gt;https://www.aicrowd.com/challenges/neurips-2019-disentanglement-challenge&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;I had first encountered this flavor of usage of the term in statistical learning during the last stages of my doctoral journey at CMU (circa 2013) when I read ‘&lt;a href=&quot;https://arxiv.org/pdf/1305.0445.pdf&quot;&gt;Deep Learning of Representations: Looking Forward&lt;/a&gt;’ by Yoshua Bengio in which he emphasized the need to be ‘.. learning to &lt;em&gt;disentangle&lt;/em&gt; the factors of variation underlying the observed data’.(How I wish he still authored such single author papers)&lt;/p&gt;

&lt;p&gt;As it turns out, much to the chagrin of the physicists perhaps, if you are working on teasing out visual style from digit type on MNIST, or separating &lt;em&gt;shape and pose in images of human bodies and facial features from facial shape on CelebA&lt;/em&gt; or grappling with unwrapping the effects of mixture ratio of the two constituent compounds and environmental factors such as thermal fluctuation in images generated for microstructure growth, you are &lt;em&gt;disentangling.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;There seems to be no consensus on what the term precisely means or what metric(s) capture the extent of it, an observation that is confirmed by this rather funny/snarky slide in Stafano Soatto’s talk at IPAM (refer to the playlist below)&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/0*4XdPFL9IiZM63uh8&quot; alt=&quot;&quot; /&gt;Fig 2: Invariance and disentanglement in deep representations&lt;/p&gt;

&lt;p&gt;That said, this is not a case of there existing a mere smattering of empirical experiments that all use their own customized notion of disentanglement. In fact, reasonably rigorous frameworks have been proposed harnessing powerful tools from areas such as Variational inference, Shannonian Information theory, Group theory and matrix factorization. Deepmind’s group theoretic treatment of the same seems to have perched itself as one of the go-to frameworks. In case you are looking for a succinct 3 min recap of what this is, please refer to this &lt;a href=&quot;https://www.youtube.com/watch?v=PeZIo0Q_GwE&amp;amp;t=420s&quot;&gt;video &lt;/a&gt;that I saw during one of Simons Institute workshops (around the 7th minute). (A very detailed talk from one of the main authors of the Deepmind group can be found &lt;a href=&quot;https://www.youtube.com/watch?v=XNGo9xqpgMo&quot;&gt;here&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/0*cG25-15Br6ZP57ot&quot; alt=&quot;&quot; /&gt;Fig 3: Group theoretic framework for disentanglement&lt;/p&gt;

&lt;h4 id=&quot;a-birds-view-of-the-papers-presented&quot;&gt;A bird’s view of the papers presented&lt;/h4&gt;

&lt;p&gt;In Fig 4 below, is a bird’s-eye view of the 12 papers presented. I roughly bucketized them into two subsections depending on whether the &lt;em&gt;main&lt;/em&gt; perceived goal of the paper (from my humble viewpoint) was to either analyze and/or critique the properties of a pre-existing framework or to harness one and apply the same to an interesting problem domain. Bear in mind that this is admittedly a rather simplistic categorization and this is not very instructive of whether the applications oriented papers did or did not critique and analyze the frameworks used or that the analysis/critique papers did not include real-world applications.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/0*BMuHRXrKqbBEHR8r&quot; alt=&quot;&quot; /&gt;Fig 4: Disentanglement papers categorization (NEURIPS -2019)&lt;/p&gt;

&lt;p&gt;(You can find the pdf version with the paper links here: &lt;a href=&quot;https://github.com/vinayprabhu/Disentanglement_NEURIPS_2019/blob/master/Disentanglement_papers_tree-diagram.pdf&quot;&gt;https://github.com/vinayprabhu/Disentanglement_NEURIPS_2019/blob/master/Disentanglement_papers_tree-diagram.pdf&lt;/a&gt; )&lt;/p&gt;

&lt;h4 id=&quot;what-do-they-mean-by-disentanglement&quot;&gt;What do they mean by disentanglement?&lt;/h4&gt;

&lt;p&gt;In order to summarize the contexts in which disentanglement was used in these papers, I created a look-up-table (See Table-1). In those cases where the authors explicitly did not have a subsection dedicated to defining the same, I improvised and extracted the gist (and hence the caveat [improv]).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*9GlJnSREcytGy2QI305YgQ.png&quot; alt=&quot;&quot; /&gt;Table-1(a) Disentanglement context in the application papers&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*7NKausyLKSw4a35U7wSfQA.png&quot; alt=&quot;&quot; /&gt;Table-1(b) Disentanglement context in the analysis papers&lt;/p&gt;

&lt;h4 id=&quot;reproducibility-and-open-sourced-code&quot;&gt;Reproducibility and open-sourced code:&lt;/h4&gt;

&lt;p&gt;Given the strong growing trend towards open sourcing the code used to produce the results, 10 of the 12 author-groups shared their github repos as well. This is captured in Table-2 below:&lt;/p&gt;

&lt;p&gt;Table-2: Papers and the open-source code links&lt;/p&gt;

&lt;h4 id=&quot;what-now-some-ideas&quot;&gt;What now? Some ideas..&lt;/h4&gt;

&lt;p&gt;[Here are some scribbles to try and guilt myself into working on this more seriously. Please take these with a grain of salt or 12 :) ]&lt;/p&gt;

&lt;p&gt;1: Survey paper detailing the definitions, frameworks and metrics to be used.&lt;/p&gt;

&lt;p&gt;2: Disentangling author/writing style/nation of origin using Kannada-MNIST dataset. (65 native volunteers from India and 10 non-native volunteers from USA)&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/vinayprabhu/Kannada_MNIST&quot;&gt;https://github.com/vinayprabhu/Kannada_MNIST&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3: It’s somewhat surprising that no one’s tried throwing a K user interference channel model for entanglement and see if an Interference Alignment [&lt;a href=&quot;https://arxiv.org/pdf/0707.0323.pdf&quot;&gt;https://arxiv.org/pdf/0707.0323.pdf&lt;/a&gt; ] like trick works for Dsprites-like datatsets&lt;/p&gt;

&lt;p&gt;4: Disentangling Shoe type, pocket and device location from Gait representations&lt;/p&gt;

&lt;p&gt;5: Bridging the body of work pertaining to &lt;a href=&quot;http://www.ee.cuhk.edu.hk/~wkma/publications/slides-%20HU-%20%20CWHISPERS%202015.pdf&quot;&gt;(Hyperspectral) Unmixing&lt;/a&gt; / Blind source separation and disentangled representation learning.&lt;/p&gt;

&lt;h3 id=&quot;resource-list&quot;&gt;Resource list:&lt;/h3&gt;

&lt;p&gt;Companion github repo replete with paper summaries and cheat sheets.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/vinayprabhu/Disentanglement_NEURIPS_2019&quot; title=&quot;https://github.com/vinayprabhu/Disentanglement_NEURIPS_2019&quot;&gt;&lt;strong&gt;vinayprabhu/Disentanglement_NEURIPS_2019&lt;/strong&gt;&lt;br /&gt;
 &lt;em&gt;TL;DR - On disentangling the menagerie of disentanglement papers During this NEURIPS(2019), I encountered a glut of…&lt;/em&gt; github.com&lt;/a&gt;&lt;a href=&quot;https://github.com/vinayprabhu/Disentanglement_NEURIPS_2019&quot;&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4 id=&quot;a-datasets-to-get-started-with&quot;&gt;A. Datasets to get started with:&lt;/h4&gt;

&lt;p&gt;[1] &lt;a href=&quot;https://www.github.com/cianeastwood/qedr&quot;&gt;https://www.github.com/cianeastwood/qedr&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] &lt;a href=&quot;https://github.com/deepmind/dsprites-dataset&quot;&gt;https://github.com/deepmind/dsprites-dataset&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] &lt;a href=&quot;https://github.com/rr-learning/disentanglement_dataset&quot;&gt;https://github.com/rr-learning/disentanglement_dataset&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Major props to the NeurIPS 2019 : Disentanglement Challenge organizers for the resources they shared as well! )&lt;/p&gt;

&lt;p&gt;Link: &lt;a href=&quot;https://www.aicrowd.com/challenges/neurips-2019-disentanglement-challenge&quot;&gt;https://www.aicrowd.com/challenges/neurips-2019-disentanglement-challenge&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/google-research/disentanglement_lib/tree/master/disentanglement_lib/evaluation/metrics&quot; title=&quot;https://github.com/google-research/disentanglement_lib/tree/master/disentanglement_lib/evaluation/metrics&quot;&gt;&lt;strong&gt;google-research/disentanglement_lib&lt;/strong&gt;&lt;br /&gt;
 &lt;em&gt;You can’t perform that action at this time. You signed in with another tab or window. You signed out in another tab or…&lt;/em&gt; github.com&lt;/a&gt;&lt;a href=&quot;https://github.com/google-research/disentanglement_lib/tree/master/disentanglement_lib/evaluation/metrics&quot;&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4 id=&quot;b-video-playlist&quot;&gt;B. Video playlist:&lt;/h4&gt;

&lt;p&gt;[1] Y. Bengio’s: From Deep Learning of Disentangled Representations to Higher-level Cognition&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=Yr1mOzC93xs&amp;amp;t=355s&quot;&gt;https://www.youtube.com/watch?v=Yr1mOzC93xs&amp;amp;t=355s&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] \beta-VAE (Deepmind): &lt;a href=&quot;https://www.youtube.com/watch?v=XNGo9xqpgMo&quot;&gt;https://www.youtube.com/watch?v=XNGo9xqpgMo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] Flexibly Fair Representation Learning by Disentanglement: &lt;a href=&quot;https://www.youtube.com/watch?v=nlilKO1AvVs&amp;amp;t=27s&quot;&gt;https://www.youtube.com/watch?v=nlilKO1AvVs&amp;amp;t=27s&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[4] Disentangled Representation Learning GAN for Pose-Invariant Face Recognition: &lt;a href=&quot;https://www.youtube.com/watch?v=IjsBTZqCu-I&quot;&gt;https://www.youtube.com/watch?v=IjsBTZqCu-I&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[5] Invariance and disentanglement in deep representations (Fun talk), &lt;a href=&quot;https://www.youtube.com/watch?v=zbg49SMP5kY&quot;&gt;https://www.youtube.com/watch?v=zbg49SMP5kY&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(From NEURIPS 2019 authors)&lt;br /&gt;
[1] The Audit Model Predictions paper: &lt;a href=&quot;https://www.youtube.com/watch?v=PeZIo0Q_GwE&quot;&gt;https://www.youtube.com/watch?v=PeZIo0Q_GwE&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] Twiml interview of Olivier Bachem (3 papers on this topic at NEURIPS-19): &lt;a href=&quot;https://www.youtube.com/watch?v=Gd1nL3WKucY&quot;&gt;https://www.youtube.com/watch?v=Gd1nL3WKucY&lt;/a&gt;&lt;/p&gt;

&lt;h4 id=&quot;c-cheat-sheets&quot;&gt;C. Cheat sheets&lt;/h4&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/0*S7g0qe3XEPsbfPRo&quot; alt=&quot;&quot; /&gt;Cheat sheet-1: All the abstracts! (Print on A3/2)&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/0*W4pzqNO_-5iZvK2g&quot; alt=&quot;&quot; /&gt;Cheat sheet-2: All the essences!&lt;/p&gt;
</description>
        <pubDate>Mon, 23 Dec 2019 21:06:02 +0000</pubDate>
        <link>https://vinayprabhu.github.io/alignchronicles/research/computer-vision/2019/12/23/disentangling-disentanglement-in-deep-learning/</link>
        <guid isPermaLink="true">https://vinayprabhu.github.io/alignchronicles/research/computer-vision/2019/12/23/disentangling-disentanglement-in-deep-learning/</guid>
        
        
        <category>research</category>
        
        <category>computer-vision</category>
        
      </item>
    
      <item>
        <title>Build your own custom hotword detector with zero training data and $0!</title>
        <description>&lt;hr /&gt;

&lt;h3 id=&quot;build-your-own-custom-hotword-detector-with-zero-training-data-and-0&quot;&gt;Build your own custom hotword detector with zero training data and $0!&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;TLDR&lt;/strong&gt; : &lt;em&gt;Google TTS - &amp;gt; Noise augment -&amp;gt; {wav files} -&amp;gt;SnowBoy -&amp;gt;{.pmdl models} -&amp;gt; Raspberry Pi&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;OK, so it’s that time of the year again. You know&lt;a href=&quot;https://burningman.org/&quot;&gt; there’s &lt;em&gt;that&lt;/em&gt; thing in the desert&lt;/a&gt;. Last time around, I rigged up a Google AIY vision kit and added &lt;a href=&quot;http://espeak.sourceforge.net/&quot;&gt;&lt;em&gt;espeak&lt;/em&gt;&lt;/a&gt;on &lt;a href=&quot;https://www.facebook.com/CampByteThis/&quot;&gt;Chip and Terra&lt;/a&gt; , the art installations of the motley bunch that is &lt;a href=&quot;https://www.facebook.com/CampByteThis/&quot;&gt;BŸTE: Burners for Ÿntelligent Technology Emancipation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The result was this:&lt;/p&gt;

&lt;p&gt;This time around, I decided to add an extra sensory ability: The ability to listen and respond, which in Machine Learning plain-speak translates to rigging up a hotword detection engine that’ll work on-device and offline. Easy peezy no? Hand me the beer perhaps? Read on padawan …&lt;/p&gt;

&lt;p&gt;So, I began scouting around for simple off-the-shelf solutions and I chanced upon the awesome &lt;a href=&quot;https://snowboy.kitt.ai/&quot;&gt;&lt;em&gt;SnowBoy&lt;/em&gt;&lt;/a&gt; __ off-line hotword detector. This came with constraints of course! You could potentially download machine learning models pre-trained to detection specific popular hotwords such as &lt;em&gt;Alexa&lt;/em&gt; and &lt;em&gt;Jarvis(See pic below)…&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*vCK34zOAVSaUOC6C63bvzg.png&quot; alt=&quot;&quot; /&gt; The off-the-shelf available hotword models&lt;/p&gt;

&lt;p&gt;.. but in order to truly build your own robust model for the precise custom hotword, you need ~ 500 volunteers contributing 3 samples each. In spite of dipping into my gloriously dirt-poor social media reach, I was able to muster a grand total of ~ 5 donors :’(&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*AVujJtInf02m-ztW1eMKEw.png&quot; alt=&quot;&quot; /&gt;5 donors from my social media campaign!&lt;/p&gt;

&lt;p&gt;Seeing this go nowhere, I thought of generating my own dataset. I had recently worked on a couple of synthetic-to-real word transfer learning projects one of which I published at Deep-generative-model workshop at ICLR (See &lt;a href=&quot;https://arxiv.org/abs/1905.08633&quot;&gt;https://arxiv.org/abs/1905.08633&lt;/a&gt;), and thought that if WaveNet is indeed so impressive at generating realistic sounding text-to-speech, I could dip into that sweet $300 free Google cloud credits that Goggle doles out to do the data collection for me and transfer learn into the real world with some nifty noise augmentation and yes, unreasonable reasonableness of a deep neural network’s ability to generalize!&lt;/p&gt;

&lt;h4 id=&quot;phase-1-generating-the-synthetic-hotword-audio-files-in-different-voices-using-google-tts&quot;&gt;&lt;strong&gt;Phase-1: Generating the synthetic hotword audio files in different voices using Google TTS&lt;/strong&gt;&lt;/h4&gt;

&lt;p&gt;So, I created a temporary GC account, religiously followed the documentation and scribbled some Python code ( Shared here: &lt;a href=&quot;https://github.com/vinayprabhu/BurningMan2019/blob/master/generate_SYNTHETIC_audiofiles.py&quot;&gt;https://github.com/vinayprabhu/BurningMan2019/blob/master/generate_SYNTHETIC_audiofiles.py&lt;/a&gt; ). In about 5 min, I had 189 .wav files of the hot-word I was targeting (which was &lt;strong&gt;&lt;em&gt;Hey Chip!&lt;/em&gt;&lt;/strong&gt; BTW) in different accents, or more formally as &lt;em&gt;voices.&lt;/em&gt; You can download this entire treasure trove from &lt;a href=&quot;https://github.com/vinayprabhu/BurningMan2019/blob/master/wav.zip&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*Rl41jgUHxNLMurLczDQz8A.png&quot; alt=&quot;&quot; /&gt;Using Google TTS to generate synthetic training data&lt;/p&gt;

&lt;p&gt;Some of the example sounds that were my favorite were these:&lt;/p&gt;

&lt;p&gt;Example sound files from the Google TTS engine!&lt;/p&gt;

&lt;p&gt;Now that we had these 189 .wav files for the different voices, for each of these voices, I performed plain-vanilla additive white Gaussian noise augmentation to get (189 x 3) wav files. Here is the &lt;a href=&quot;https://github.com/vinayprabhu/BurningMan2019/blob/master/Colab_Notebooks/wav_augmentation.ipynb&quot;&gt;colab notebook&lt;/a&gt; associated with this task.&lt;/p&gt;

&lt;h4 id=&quot;phase-2-training-the-hot-word-detection-models-using-the-synthetic-noise-augmented-wav-files&quot;&gt;Phase-2: Training the hot-word detection models using the synthetic noise-augmented wav files&lt;/h4&gt;

&lt;p&gt;The &lt;em&gt;snowboy&lt;/em&gt; technology, as promising as it is, is still is in it’s nascency. The api for training your own models programmatically looks rather restrictive:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;python training_service**.** py 1.wav 2.wav 3.wav saved_model**.** pmdl
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As seen, you need to feed in precisely 3 wav files to spit-out a model. So, yes. I did generate 189 models per voice (I’d be very very glad to be proven wrong in this regard) and logical ‘OR’ed them together. The colab notebook that ingests the wav files and trains the ML models is shared here: &lt;a href=&quot;https://github.com/vinayprabhu/BurningMan2019/blob/master/Colab_Notebooks/model_gen.ipynb&quot;&gt;https://github.com/vinayprabhu/BurningMan2019/blob/master/Colab_Notebooks/model_gen.ipynb&lt;/a&gt;&lt;/p&gt;

&lt;h4 id=&quot;phase-3-combining-together-all-the-models-and-running-them-on-the-raspberry-pi&quot;&gt;Phase-3: Combining together all the models and running them on the Raspberry Pi&lt;/h4&gt;

&lt;p&gt;OK. So, this phase was kinda tricky. Make sure that you follow this repo’s documentation rather patiently and religiously:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/wanleg/snowboyPi&quot; title=&quot;https://github.com/wanleg/snowboyPi&quot;&gt;&lt;strong&gt;wanleg/snowboyPi&lt;/strong&gt;&lt;br /&gt;
 _Start with a fresh install of Raspbian (Lite or Regular, this guide assumes Lite) sudo apt update &amp;amp;&amp;amp; sudo apt -y…_github.com&lt;/a&gt;&lt;a href=&quot;https://github.com/wanleg/snowboyPi&quot;&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One main source of irritation with audio-projects on Raspberry Pi are the bedeviled ALSA shenanigans, and the ensuing fight between the HDMI, USB-audio and local-audio-out ports for the audio-dominion. In order to circumvent that, I used the good ol’ SAMSON mike in + audio-out rig (Apparently Amazon &lt;a href=&quot;https://www.amazon.com/Samson-Mic-Portable-Condenser-Microphone/dp/B001R76D42/ref=asc_df_B001R76D42/?tag=hyprod-20&amp;amp;linkCode=df0&amp;amp;hvadid=312039437910&amp;amp;hvpos=1o4&amp;amp;hvnetw=g&amp;amp;hvrand=1977879924165500956&amp;amp;hvpone=&amp;amp;hvptwo=&amp;amp;hvqmt=&amp;amp;hvdev=c&amp;amp;hvdvcmdl=&amp;amp;hvlocint=&amp;amp;hvlocphy=9031928&amp;amp;hvtargid=pla-318320044466&amp;amp;psc=1&quot;&gt;peddles these&lt;/a&gt; at $29.99 now! They are much cheaper on SP road, Bangalore.)&lt;/p&gt;

&lt;p&gt;The entire setup looks like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*bie-m17myj3H0o9IQfCwAw.png&quot; alt=&quot;&quot; /&gt;The raspberry pi-3 set-up with the Samson mike&lt;/p&gt;

&lt;p&gt;Before beginning the, ahem, human trials, I tried to isolate out the effects of the mike being used by means of a simple test case where I played the Google TTS output audio file on a laptop and checked if the audio-file’s pertaining .pmdl running on the Raspberry Pi would indeed get triggered by the synthetic utterance. The result was gloriously good!&lt;/p&gt;

&lt;p&gt;Synthetic data input trials&lt;/p&gt;

&lt;p&gt;Now that there was some hope, I began tweaking the snowboy.py script (from here: &lt;a href=&quot;https://github.com/wanleg/snowboyPi/blob/master/snowboy.py&quot;&gt;https://github.com/wanleg/snowboyPi/blob/master/snowboy.py&lt;/a&gt;) to include all the .pmdl model files I had just generated so that at least one would get triggered when a real world homo sapien would utter the key words ‘Hey chip!’. It turns out, all you need to do is to add the list of models on line#29 here: &lt;a href=&quot;https://github.com/wanleg/snowboyPi/blob/master/snowboy.py#L29&quot;&gt;https://github.com/wanleg/snowboyPi/blob/master/snowboy.py#L29&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;FINALE:&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;em&gt;Now that all the files are packed in (you never have to worry about the size of these individual DNN models by the way. There are all ~ 10KB each. Impressive work&lt;/em&gt;&lt;a href=&quot;http://docs.kitt.ai/snowboy/&quot;&gt; &lt;em&gt;SnowBoy people&lt;/em&gt;&lt;/a&gt; &lt;em&gt;!), I finally decided to do the final real world test with real human voice inputs and the result was .. *drumrolls*.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ta..Da! It works! It works rather seamlessly with my normal (Indian accented) voice and my Adam Levine voice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase-4: Deployment on Chip and Terra + casing + playa deployment&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I am awaiting for this phase with bated breath. I’ll update this blogpost as soon I get back to the default world!&lt;/p&gt;
</description>
        <pubDate>Mon, 19 Aug 2019 14:07:30 +0000</pubDate>
        <link>https://vinayprabhu.github.io/alignchronicles/research/computer-vision/2019/08/19/build-your-own-custom-hotword-detector-with-zero-training-data-and-0/</link>
        <guid isPermaLink="true">https://vinayprabhu.github.io/alignchronicles/research/computer-vision/2019/08/19/build-your-own-custom-hotword-detector-with-zero-training-data-and-0/</guid>
        
        
        <category>research</category>
        
        <category>computer-vision</category>
        
      </item>
    
      <item>
        <title>A new handwritten digits dataset in ML town: Kannada-MNIST</title>
        <description>&lt;hr /&gt;

&lt;h3 id=&quot;kannada-mnista-new-handwritten-digits-dataset-in-ml-town&quot;&gt;Kannada-MNIST:A new handwritten digits dataset in ML town&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;/alignchronicles/assets/images/posts/a-new-handwritten-digits-dataset-in-ml-town-kannada-mnist-img-1.png&quot; alt=&quot;&quot; /&gt;Class-wise mean images of the 10 handwritten digits in the Kannada MNIST dataset&lt;/p&gt;

&lt;h3 id=&quot;tldr&quot;&gt;TLDR:&lt;/h3&gt;

&lt;p&gt;I am disseminating 2 datasets:&lt;br /&gt;
&lt;strong&gt;Kannada-MNIST dataset&lt;/strong&gt; : 28X 28 grayscale images: 60k Train | 10k Test&lt;br /&gt;
** &lt;em&gt;Dig&lt;/em&gt; -MNIST:** 28X 28 grayscale images: 10240 (1024x10) {See pic below}&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*umXsR1L29KkRoaA6PqgzzA.png&quot; alt=&quot;&quot; /&gt;Putting the ‘Dig’ in Dig-MNIST&lt;/p&gt;

&lt;p&gt;The Kannada-MNIST dataset is meant to be a drop-in replacement for the MNIST dataset 🙏 , albeit for the numeral symbols in the Kannada language .&lt;br /&gt;
Also, I am disseminating an additional dataset of 10k handwritten digits in the same language (predominantly by the non-native users of the language) called Dig-MNIST that can be used as an additional test set.&lt;/p&gt;

&lt;p&gt;Resource-list:&lt;/p&gt;

&lt;p&gt;GitHub 👉: &lt;a href=&quot;https://github.com/vinayprabhu/Kannada_MNIST&quot;&gt;https://github.com/vinayprabhu/Kannada_MNIST&lt;/a&gt;&lt;br /&gt;
Kaggle 👉: &lt;a href=&quot;https://www.kaggle.com/higgstachyon/kannada-mnist&quot;&gt;https://www.kaggle.com/higgstachyon/kannada-mnist&lt;/a&gt;&lt;br /&gt;
ArXiv 👉 : &lt;a href=&quot;https://arxiv.org/pdf/1908.01242.pdf&quot;&gt;https://arxiv.org/pdf/1908.01242.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you use Kannada-MNIST in a peer reviewed paper, we would appreciate referencing it as:&lt;/p&gt;

&lt;p&gt;Prabhu, Vinay Uday. “Kannada-MNIST: A new handwritten digits dataset for the Kannada language.” arXiv preprint arXiv:1908.01242 (2019)..&lt;/p&gt;

&lt;p&gt;Bibtex entry:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;@article{prabhu2019kannada,  
  title={Kannada-MNIST: A new handwritten digits dataset for the Kannada language},  
  author={Prabhu, Vinay Uday},  
  journal={arXiv preprint arXiv:1908.01242},  
  year={2019}  
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;introduction&quot;&gt;Introduction:&lt;/h3&gt;

&lt;p&gt;Kannada is the official and administrative language of the state of Karnataka in India with nearly 60 million speakers worldwide. Also, as per articles 344(1) and 351 of the Indian Constitution, Kannada holds the status of being one of the 22 scheduled languages of India . The language is written using the official Kannada script, which is an abugida of the Brahmic family and traces its origins to the Kadamba script (325–550 AD).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*xaZWZBr2UfxUgTz0v_HKDg.jpeg&quot; alt=&quot;&quot; /&gt;Kannada stone inscriptions: Source: &lt;a href=&quot;https://karnatakaitihasaacademy.org/karnataka-epigraphy/inscriptions/&quot;&gt;https://karnatakaitihasaacademy.org/karnataka-epigraphy/inscriptions/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Distinct glyphs are used to represent the numerals 0–9 in the language that appear distinct from the modern Hindu-Arabic numerals in vogue in much of the world today. Unlike some of the other archaic numeral-systems, these numerals are very much used in day-to-day affairs in Karnataka, as in evinced by the prevalence of these glyphs on license-plates of vehicles captured in the pic below:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*3-9oPIx2OpN6K_0LT6TIew.jpeg&quot; alt=&quot;&quot; /&gt;A vehicle license plate with Kannada numeral glyphs&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*ZA-VdN1bW1mzhBRhL7u-tQ.png&quot; alt=&quot;&quot; /&gt;MNIST-ized renderings of the variations of the glyphs across the modern Kannada fonts&lt;/p&gt;

&lt;p&gt;This figure here captures the MNIST-ized renderings of the variations of the glyphs across the following modern fonts: &lt;em&gt;Kedage, Malige-i, Malige-n, Malige-b, Kedage-n, Malige-t, Kedage-t, Kedage-i, Lohit-Kannada, Sampige and Hubballi-Regular&lt;/em&gt;.&lt;/p&gt;

&lt;h3 id=&quot;dataset-curation&quot;&gt;Dataset curation:&lt;/h3&gt;

&lt;h4 id=&quot;kannada-mnist&quot;&gt;Kannada-MNIST:&lt;/h4&gt;

&lt;p&gt;65 volunteers were recruited in Bangalore, India, who were native speakers of the language as well as day-to-day users of the numeral script. Each volunteer filled out an A3 sheet containing a 32 × 40 grid. This yielded filled-out A3 sheets containing 128 instances of each number which we posit is large enough to capture most of the natural intra-volunteer variations of the glyph shapes. All of the sheets thus collected were scanned at 600 dots-per-inch resolution using the Konica Accurio-Press-C6085 scanner that yielded 65 4963 × 3509 png images.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*IPd-9QTVqRKVZT55kXGz9A.jpeg&quot; alt=&quot;&quot; /&gt;Volunteers helping curate the Kannada-MNIST dataset&lt;/p&gt;

&lt;h4 id=&quot;dig-mnist&quot;&gt;Dig-MNIST:&lt;/h4&gt;

&lt;table&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;8 volunteers aged 20 to 40 were recruited to generate a 32 × 40 grid of Kannada numerals (akin to 2.1), all written with a black ink Z-Grip Series&lt;/td&gt;
      &lt;td&gt;Zebra Pen on a commercial Mead Cambridge Quad Writing Pad, 8–1/2” x 11”, Quad Ruled, White, 80 Sheets/Pad book. We then scan the sheet(s) using a Dell — S3845cdn scanner with the following settings: • Output color: Grayscale • Original type: Text • Lighten/Darken: Darken+3 • Size: Auto-detect The reduced size of the sheets used for writing the digits (US-letter vis-a-vis A3) resulted in smaller scan (.tif) images that were all approximately 1600×2000.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;comparisons-with-mnist&quot;&gt;Comparisons with MNIST:&lt;/h3&gt;

&lt;p&gt;1: Mean pixel-intensities distribution:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*IvUn9GCrsxh057tiOYFQjQ.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;2: Morphological properties:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*a--YyuBlcI-wwpVBZtRynA.png&quot; alt=&quot;&quot; /&gt;Code source: &lt;a href=&quot;https://github.com/dccastro/Morpho-MNIST&quot;&gt;https://github.com/dccastro/Morpho-MNIST&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3: PCA-analysis:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*qBy0LQ1FdXM9luO5zU9HKQ.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;4: UMAP visualizations:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*zRDnLyX77tHh2l6_2lLniw.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;some-classification-bench-marking&quot;&gt;Some classification bench-marking:&lt;/h3&gt;

&lt;p&gt;I used a &lt;em&gt;standard&lt;/em&gt; MNIST-CNN architecture to get some basic accuracy benchmarks (See fig below)&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*dGhUnlrByhN62gaXEAw4CQ.png&quot; alt=&quot;&quot; /&gt;The CNN architecture used for the benchmarks&lt;/p&gt;

&lt;h4 id=&quot;a-train-on-kannada-mnist-train-and-test-on-kannada-mnist-test&quot;&gt;(a) Train on Kannada-MNIST train and test on Kannada-MNIST test&lt;/h4&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*pAQ9q-uurYH6nha46uV0Xg.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h4 id=&quot;b-train-on-kannada-mnist-train-and-test-on-dig-mnist&quot;&gt;(b) Train on Kannada-MNIST train and test on Dig-MNIST&lt;/h4&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*xqk2EF2mFJd08acxKw-3pA.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;open-challenges-to-the-machine-learning-community&quot;&gt;Open challenges to the machine learning community&lt;/h3&gt;

&lt;p&gt;We propose the following open challenges to the machine learning community at large.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;To characterize the nature of catastrophic forgetting when a CNN pre-trained on MNIST is retrained with Kannada-MNIST. This is particularly interesting given the observation that the typographical glyphs for 3 and 7 in Kannada-MNIST hold uncanny resemblance with the glyph for 2 in MNIST.&lt;/li&gt;
  &lt;li&gt;Get a model trained on purely synthetic data generated using the fonts (as in [1]) and augmenting to achieve high accuracy of the Kannada-MNIST and Dig-MNIST datasets.&lt;/li&gt;
  &lt;li&gt;Replicate the procedure described in the paper across different languages/scripts, especially the Indic scripts.&lt;/li&gt;
  &lt;li&gt;With regards to the dig-MNIST dataset, we saw that some of the volunteers had transgressed the borders of the grid and hence some of the images either have only a partial slice of the glyph/stroke or have an appearance where it can be argued that they could potentially belong to either of two different classes. With regards to these images, it would be worthwhile to see if we can design a classifier that will allocate proportionate softmax masses to the candidate classes.&lt;/li&gt;
  &lt;li&gt;The main reason behind us sharing the raw scan images was to foster research into auto-segmentation algorithms that will parse the individual digit images from the grid, which might in turn lead to higher quality of images in the upgraded versions of the dataset.&lt;/li&gt;
  &lt;li&gt;Achieve MNIST-level accuracy by training on the Kannada-MNIST dataset and testing on the Dig-MNIST dataset without resorting to image pre-processing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;[1] Prabhu, Vinay Uday, Sanghyun Han, Dian Ang Yap, Mihail Douhaniaris, Preethi Seshadri, and John Whaley. “Fonts-2-Handwriting: A Seed-Augment-Train framework for universal digit classification.” &lt;em&gt;arXiv preprint arXiv:1905.08633&lt;/em&gt; (2019). [ &lt;a href=&quot;https://arxiv.org/abs/1905.08633&quot;&gt;https://arxiv.org/abs/1905.08633&lt;/a&gt; ]&lt;/p&gt;
</description>
        <pubDate>Mon, 12 Aug 2019 07:02:31 +0000</pubDate>
        <link>https://vinayprabhu.github.io/alignchronicles/research/computer-vision/2019/08/12/a-new-handwritten-digits-dataset-in-ml-town-kannada-mnist/</link>
        <guid isPermaLink="true">https://vinayprabhu.github.io/alignchronicles/research/computer-vision/2019/08/12/a-new-handwritten-digits-dataset-in-ml-town-kannada-mnist/</guid>
        
        
        <category>research</category>
        
        <category>computer-vision</category>
        
      </item>
    
      <item>
        <title>Cats to crazy quilts: Using style transfer to generate adversarial examples</title>
        <description>&lt;hr /&gt;

&lt;h3 id=&quot;cats-to-crazy-quilts-using-style-transfer-to-generate-adversarial-examples&quot;&gt;Cats to crazy quilts: Using style transfer to generate adversarial examples&lt;/h3&gt;

&lt;h3 id=&quot;prelude&quot;&gt;&lt;strong&gt;Prelude:&lt;/strong&gt;&lt;/h3&gt;

&lt;p&gt;Let’s begin with a simple introduction into the world of adversarial inputs. These are inputs into a &lt;a href=&quot;https://hackernoon.com/tagged/machine-learning&quot;&gt;machine learning&lt;/a&gt; classifier that have been shrewdly perturbed in such a way that these changes are near damn invisible to the naked eye but can fool the machine learning classifier into predicting either a arbitrary wrong class (Un-targeted) or a specific wrong class (targeted).&lt;/p&gt;

&lt;p&gt;There are two defining images that come to my mind when I think of this field at large. The first one is the classic &lt;em&gt;Panda-to-Nematode&lt;/em&gt; image from &lt;a href=&quot;https://arxiv.org/pdf/1412.6572.pdf&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*qEDDXIassDlGeL1fnuumCw.png&quot; alt=&quot;&quot; /&gt;The now iconic example of a panda’s image getting perturbed into a gibbon (Source: &lt;a href=&quot;https://arxiv.org/pdf/1412.6572.pdf&quot;&gt;https://arxiv.org/pdf/1412.6572.pdf&lt;/a&gt; )&lt;/p&gt;

&lt;p&gt;The second one, is this one below that provides a geometrical perspective on where these adversarial inputs actually reside.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*r-vRrUZgjRi-DesNCM8XtQ.png&quot; alt=&quot;&quot; /&gt;An image that provides a geometrical perspective on the adversarial inputs (Source: &lt;a href=&quot;https://infoscience.epfl.ch/record/229872/files/spm_preprint.pdf&quot;&gt;https://infoscience.epfl.ch/record/229872/files/spm_preprint.pdf&lt;/a&gt; )&lt;/p&gt;

&lt;p&gt;Where I &lt;a href=&quot;https://unify.id/labs/&quot;&gt;work&lt;/a&gt;, harnessing adversarial examples in a non-computer vision setting for dataset augmentation (to increase both robustness and generalizatibity) forms a key part of our pipeline. In this regard, we have disseminated a few humble attempts such as &lt;a href=&quot;https://unify.id/2017/07/21/vulnerability-of-deep-learning-based-gait-biometric-recognition-to-adversarial-perturbations-2/&quot;&gt;&lt;em&gt;Vulnerability of deep learning-based gait biometric recognition to adversarial perturbations&lt;/em&gt;&lt;/a&gt; &lt;em&gt;,&lt;/em&gt;&lt;a href=&quot;https://unify.id/wp-content/uploads/2018/03/greybox_attack.pdf&quot;&gt;&lt;em&gt;On grey-box adversarial attacks and transfer learning&lt;/em&gt;&lt;/a&gt; &lt;em&gt;and&lt;/em&gt;&lt;a href=&quot;https://unify.id/wp-content/uploads/2018/03/lyap_e.pdf&quot;&gt; &lt;em&gt;On Lyapunov exponents and adversarial perturbations.&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Recently while dabbling with the idea of using &lt;strong&gt;&lt;em&gt;interpolated style transfer&lt;/em&gt;&lt;/strong&gt; to generate mutually adversarial pairs of images, I chanced upon this fuzziness surrounding one of the more fundamental questions of machine learning: What does constitute a true label and how do machine learning companies offering commercial off-the-shelf (OTS) &lt;a href=&quot;https://hackernoon.com/tagged/apis&quot;&gt;APIs &lt;/a&gt;define the same?&lt;/p&gt;

&lt;h3 id=&quot;tldr&quot;&gt;&lt;strong&gt;TLDR:&lt;/strong&gt;&lt;/h3&gt;

&lt;p&gt;1: We describe an experiment that entailed using style transferred images to target mis-classification in the context of a specific popular commercial off-the-shelf (OTS) API (I use the &lt;em&gt;Watson Visual-Recognition- V3 API, version 2016–05–20&lt;/em&gt; API for all the results shown here.)&lt;/p&gt;

&lt;p&gt;2: The style transferred images achieved adversarial attack success rates of 97:5 % (195 out of 200).&lt;/p&gt;

&lt;p&gt;3: The goal is &lt;em&gt;not to proclaim a new blackbox attack recipe&lt;/em&gt; or to berate the commercial API used, but to merely highlight the fuzzing surrounding what constitutes a true label or a true tag. This is one account of the simple observation that while using interpolated style transfer as a method for generating mutually adversarial pairs, the ’ &lt;em&gt;raw image&lt;/em&gt; ’ that is adversarially perturbed is not necessarily a naturally occurring image and is a style-transferred image itself.&lt;/p&gt;

&lt;p&gt;4: Pitch the idea of using interpolated style transfer as a recipe of generating mutually adversarial pairs that can be used for model regularization as well as generating &lt;em&gt;challenging&lt;/em&gt; co-class images as inputs into training pipelines for Siamese-net like &lt;em&gt;embedding deepnets&lt;/em&gt; trained on triplet-loss cost functions.&lt;/p&gt;

&lt;p&gt;5: Pitch the idea of using the interpolated weight as the &lt;em&gt;new semantic epsilon&lt;/em&gt; in here:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*RiG6x4etj-hydp5bgpEfMw.png&quot; alt=&quot;&quot; /&gt;Time for a new &lt;strong&gt;semantic epsilon&lt;/strong&gt;?&lt;/p&gt;

&lt;h3 id=&quot;the-deep-dive&quot;&gt;&lt;strong&gt;The Deep-dive:&lt;/strong&gt;&lt;/h3&gt;

&lt;p&gt;With this prelude in tow, the deep dive now begins.&lt;/p&gt;

&lt;p&gt;Let’s start by focusing on the figure below:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*AjiSIwHRzFC-YqkCI29CuA.png&quot; alt=&quot;&quot; /&gt;Cat2Fabric: The journey of a cat’s image into a pattern&lt;/p&gt;

&lt;p&gt;What we see is the &lt;em&gt;journey&lt;/em&gt; of the image of a cat getting style-transferred into a ‘pattern-style-image’ using the &lt;em&gt;arbitrary image stylization&lt;/em&gt; [2] &lt;a href=&quot;https://github.com/tensorflow/magenta/tree/master/magenta/models/arbitrary_image_stylization&quot;&gt;&lt;em&gt;Magenta&lt;/em&gt;&lt;/a&gt; __ project for different interpolation weights monotonically increasing from 0 to 1 (from the left to the right). As seen, with the raw image (interpolation weight (&lt;em&gt;w=0&lt;/em&gt;)) or style-transferred images with low interpolation weights (up until interpolation weight &lt;em&gt;w=0.1&lt;/em&gt;) as inputs, the commercial OTS classification API has, as expected &lt;em&gt;correctly&lt;/em&gt; classified the image as a &lt;strong&gt;&lt;em&gt;cat&lt;/em&gt;&lt;/strong&gt; with high confidence scores (0.97 to 0.99). When we increase the interpolation weight slightly to &lt;em&gt;w=0.15&lt;/em&gt; , we see a dramatic change in the inferred label landscape. The top guessed classes dramatically change from &lt;em&gt;feline, cat and carnivore&lt;/em&gt; to &lt;em&gt;cellophane, moth and invertebrate&lt;/em&gt;. &lt;br /&gt;
While the two images are virtually indistinguishable for the naked eye and are merely &lt;em&gt;0.03&lt;/em&gt; apart in terms of the structural similarity distance (which is &lt;em&gt;1-structural similarity index&lt;/em&gt; [4]) (&lt;em&gt;0.125&lt;/em&gt; apart in terms of the infinity-norm distance), the labels assigned for the two images by the black-box classifier turn out to be wildly different. &lt;br /&gt;
Thus, we refer to this pair as constituting a &lt;em&gt;mutually adversarial pair&lt;/em&gt; with regards to the black-box classifier and the distance metric used. The local texture based features that the classifier might have learned, has perhaps coaxed it into making erroneous classification, while the image still clearly looks like that of cat. Now emerges a natural query whether the artistically style transferred &lt;strong&gt;synthetically generated&lt;/strong&gt; image (with &lt;em&gt;w=0.1)&lt;/em&gt; &lt;em&gt;deserved&lt;/em&gt; to be classified as a &lt;em&gt;cat&lt;/em&gt; in the first place. This is akin to another related question of what is the normative expected class when the input is a real world figurine rather than an animate being, which brings us to the figure below.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*zV8e46uZGmt9kfp8qT_VfA.png&quot; alt=&quot;&quot; /&gt;Is this a ‘Cat’ or a ‘Cat-figurine’?&lt;/p&gt;

&lt;p&gt;Here, we see the input image ( The image was sourced from &lt;a href=&quot;https://www.wayfair.com/keyword.php?keyword=outdoor+cat+sculptures&quot;&gt;here&lt;/a&gt; ). We find this specific shopping portal to be an especially good source of such figurine art examples.&lt;br /&gt;
literally being that of an artistic cat figurine that results in a high confidence classification of being categorized a &lt;em&gt;cat&lt;/em&gt; with high confidence score (&lt;em&gt;0.89&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Specific of the experimentation procedure:&lt;/strong&gt;&lt;br /&gt;
It is indeed legitimate to ask if the cat example discussed above was idiosyncratically chosen. In order to assuage those concerns, we did the following experiment.&lt;br /&gt;
The main querying point behind the experiment was as follows:&lt;br /&gt;
&lt;em&gt;Is it indeed the case that images that are style transferred with a global low interpolation weight do result in mis-classifications?&lt;/em&gt; For this, we extracted 200 randomly chosen cat images from the &lt;a href=&quot;https://www.kaggle.com/c/dogs-vs-cats&quot;&gt;&lt;em&gt;Kaggle Dogs and Cats&lt;/em&gt;&lt;/a&gt; dataset. We resized all of them to size 299 x 299 and style transferred each one of them using the same style image extracted from the DTD dataset[1] using the style transfer algorithm detailed in [2]. The figure below showcases this with a specific example.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*6vPS_fdqMLuhg9qQVGHXiQ.png&quot; alt=&quot;&quot; /&gt;The concept&lt;/p&gt;

&lt;p&gt;In order to ensure that the images still looked ‘cat-like’ the interpolation weight was set to a &lt;em&gt;low&lt;/em&gt; value of &lt;em&gt;0.125&lt;/em&gt;. &lt;br /&gt;
One can sift through all the raw images and the style transferred images as a gif animation here below.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*9GTAeyaRRPIYLOAqYdjWhA.gif&quot; alt=&quot;&quot; /&gt;Gif of true images and their style transferred counterparts&lt;/p&gt;

&lt;p&gt;Now, both the raw images and the style transferred images were classified using the &lt;em&gt;Watson Visual Recognition- V3 API, version 2016–05–20&lt;/em&gt; API.&lt;br /&gt;
The &lt;em&gt;Accept-Language header string&lt;/em&gt; that sets the language of the output class names was set to &lt;em&gt;en&lt;/em&gt;.&lt;br /&gt;
The &lt;em&gt;owners query array&lt;/em&gt; was set to the default option (&lt;em&gt;IBM&lt;/em&gt;).&lt;br /&gt;
The &lt;em&gt;classifier-ids&lt;/em&gt; was set to &lt;em&gt;default&lt;/em&gt; that required no training and would &lt;em&gt;Return classes from thousands of general tags&lt;/em&gt;. The &lt;em&gt;threshold query&lt;/em&gt; parameter that represents the minimum score a class must have to be returned was set to &lt;em&gt;0.5&lt;/em&gt;.&lt;br /&gt;
The results are covered in the forthcoming section.&lt;br /&gt;
&lt;strong&gt;Results:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*SYEKiqjl5IWEnRkT67T2Gw.png&quot; alt=&quot;&quot; /&gt; Histogram of the top inferred labels&lt;/p&gt;

&lt;p&gt;In the figure above, we see the counts of the most probable classes that the API returned. As seen, the top 4 classes that encompassed more than &lt;em&gt;50%&lt;/em&gt; of the test images were &lt;em&gt;crazy quilt, camouflage, mosaic and patchwork&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;In the figure below, we see the scores as well as the histogram of scores related to the 200 classification trials.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*eaEtyK0tX9ivCbWGlvzM8A.png&quot; alt=&quot;&quot; /&gt;Scores and histogram of scores returned by the Watson classifier for the 200 test images&lt;/p&gt;

&lt;p&gt;As seen, we have an overwhelmingly large number of cases where the mis-classifications were made with high confidence scores associated. In the figure below, we see the 5 images that the API classified correctly.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*s8XuO5igmFLCxxTRkkVi5A.png&quot; alt=&quot;&quot; /&gt;The lucky 5: Correctly classified as ‘Cat’ by Watson&lt;/p&gt;

&lt;p&gt;Now, in this figure, we see randomly chosen 6 examples of style transferred images that were classified incorrectly.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*hO2hcirlCr-cen43Wo4ECA.png&quot; alt=&quot;&quot; /&gt;6 random not-so luckies&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion and Future Work&lt;/strong&gt;&lt;br /&gt;
Due to limitations of API usage for free-tier users, we could not extend the experiment for larger datasets, which is our immediate goal. Besides this, another question that we would like to explore is the choice of the style image. We selected an image for the texture dataset on account of 2 reasons. The first being that a pre-trained style transfer model was readily available. The second reason was based on a hunch that texture, would be in fact be the right aspect of the image to &lt;em&gt;perturb&lt;/em&gt; to induce a mis-classification.&lt;br /&gt;
As stated in the prelude, our intention is not to proclaim a new black-box attack or to berate the commercial API.&lt;/p&gt;

&lt;p&gt;Besides showcasing the potential of looking at style transfer as an adversarial example generating technique, we also wanted to draw attention to the inherent fuzziness that surrounds the definition of what constitutes an image class/category or ‘tags’ in the case of such APIs and what entails an image mis-classification.&lt;br /&gt;
The API that we used &lt;a href=&quot;https://www.ibm.com/watson/services/visual-recognition/index.html%5Coverview&quot;&gt;describes&lt;/a&gt; the technology as: &lt;strong&gt;&lt;em&gt;Watson Visual Recognition’s category-specific models enable you to analyze images for scenes, objects, faces, colors, foods, and other content&lt;/em&gt;&lt;/strong&gt;. With regards to the specific &lt;a href=&quot;https://www.ibm.com/watson/developercloud/visual-recognition/api/v3/curl.html?curl%5Cget-classify&quot;&gt;API documentation&lt;/a&gt;, it was stated that upon usage with Pre-trained models (in lieu of a custom trained classifier), the API &lt;em&gt;Returns classes from thousands of general tags.&lt;/em&gt;&lt;br /&gt;
On the concluding note, we would like to remark that we also ascertained the efficacy of these style-transferred based black-box attacks using the universal adversarial images for different Deep-nets from [3] as the style image, the results of which we plan to disseminate in the full version of this work.&lt;/p&gt;

&lt;h3 id=&quot;links&quot;&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/h3&gt;

&lt;p&gt;(This work will be presented at the &lt;a href=&quot;http://vision.soic.indiana.edu/bright-and-dark-workshop-2018/&quot;&gt;CV-COPS workshop&lt;/a&gt; @ CVPR-2018)&lt;/p&gt;

&lt;p&gt;Github: &lt;a href=&quot;https://github.com/vinayprabhu/Art_Attack&quot;&gt;https://github.com/vinayprabhu/Art_Attack&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Poster:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://cdn-images-1.medium.com/max/800/1*DrdqZi6P3xjZi54gLuJyfg.png&quot; alt=&quot;&quot; /&gt;Poster for the paper&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi. Describing textures in the wild. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 3606–3613. IEEE, 2014.&lt;/p&gt;

&lt;p&gt;[2] G. Ghiasi, H. Lee, M. Kudlur, V. Dumoulin, and J. Shlens. Exploring the structure of a real-time, arbitrary neural artistic stylization network. arXiv preprint arXiv:1705.06830, 2017.&lt;/p&gt;

&lt;p&gt;[3] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Universal adversarial perturbations. &lt;a href=&quot;https://arxiv.org/abs/1610.08401&quot;&gt;https://arxiv.org/abs/1610.08401&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[4] Z.Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600– 612, 2004.&lt;/p&gt;
</description>
        <pubDate>Wed, 20 Jun 2018 19:21:03 +0000</pubDate>
        <link>https://vinayprabhu.github.io/alignchronicles/research/computer-vision/2018/06/20/cats-to-crazy-quilts-using-style-transfer-to-generate-adversarial-examples/</link>
        <guid isPermaLink="true">https://vinayprabhu.github.io/alignchronicles/research/computer-vision/2018/06/20/cats-to-crazy-quilts-using-style-transfer-to-generate-adversarial-examples/</guid>
        
        
        <category>research</category>
        
        <category>computer-vision</category>
        
      </item>
    
  </channel>
</rss>
