this post was submitted on 14 Feb 2025
474 points (96.8% liked)

No Stupid Questions

37456 readers
2049 users here now

No such thing. Ask away!

!nostupidquestions is a community dedicated to being helpful and answering each others' questions on various topics.

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules (interactive)


Rule 1- All posts must be legitimate questions. All post titles must include a question.

All posts must be legitimate questions, and all post titles must include a question. Questions that are joke or trolling questions, memes, song lyrics as title, etc. are not allowed here. See Rule 6 for all exceptions.



Rule 2- Your question subject cannot be illegal or NSFW material.

Your question subject cannot be illegal or NSFW material. You will be warned first, banned second.



Rule 3- Do not seek mental, medical and professional help here.

Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.



Rule 4- No self promotion or upvote-farming of any kind.

That's it.



Rule 5- No baiting or sealioning or promoting an agenda.

Questions which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.



Rule 6- Regarding META posts and joke questions.

Provided it is about the community itself, you may post non-question posts using the [META] tag on your post title.

On fridays, you are allowed to post meme and troll questions, on the condition that it's in text format only, and conforms with our other rules. These posts MUST include the [NSQ Friday] tag in their title.

If you post a serious question on friday and are looking only for legitimate answers, then please include the [Serious] tag on your post. Irrelevant replies will then be removed by moderators.



Rule 7- You can't intentionally annoy, mock, or harass other members.

If you intentionally annoy, mock, harass, or discriminate against any individual member, you will be removed.

Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.



Rule 8- All comments should try to stay relevant to their parent content.



Rule 9- Reposts from other platforms are not allowed.

Let everyone have their own content.



Rule 10- Majority of bots aren't allowed to participate here.



Credits

Our breathtaking icon was bestowed upon us by @Cevilia!

The greatest banner of all time: by @TheOneWithTheHair!

founded 2 years ago
MODERATORS
 

I'm a tech interested guy. I've touched SQL once or twice, but wasn't able to really make sense of it. That combined with not having a practical use leaves SQL as largely a black box in my mind (though I am somewhat familiar with technical concepts in databasing).

With that, I keep seeing [pic related] as proof that Elon Musk doesn't understand SQL.

Can someone give me a technical explanation for how one would come to that conclusion? I'd love if you could pass technical documentation for that.

top 50 comments
sorted by: hot top controversial new old
[–] valtia@lemmy.world 28 points 6 days ago* (last edited 6 days ago) (2 children)

There can be duplicate SSNs due to name changes of an individual, that's the easiest answer. In general, it's common to just add a new record in cases where a person's information changes so you can retain the old record(s) and thus have a history for a person (look up Slowly Changing Dimensions (SCD)). That's how the SSA is able to figure out if a person changed their gender, they just look up that information using the same SSN and see if the gender in the new application is different from the old data.

Another accusation Elon made was that payments are going to people missing SSNs. The best explanation I have for that is that various state departments have their own on-premise databases and their own structure and design that do not necessarily mirror the federal master database. There are likely some databases where the SSN field is setup to accept strings only, since in real life, your SSN on your card actually has dashes, those dashes make the number into a string. If the SSN is stored as a string in a state database, then when it's brought over to the federal database (assuming the federal db is using a number field instead of text), there can be some data loss, resulting in a NULL.

[–] DarthKaren@lemmy.world 3 points 6 days ago (1 children)

JFC: married individuals, or divorced and name change back, would be totally fucked. Just on the very surface is his fuckery.

[–] GoodEye8@lemm.ee 2 points 6 days ago

Hypothetically you could have a separate "previous names" table where you keep the previous names and on the main table you only keep the current name. There are a lot of ways to design a db to not unnecessarily duplicate SSNs, but without knowing the implementation it's hard to say how wrong Musk is. But it's obvious he doesn't know what he's talking about because we know that due to human error SSN-s are not unique and you can't enforce uniqueness on SSN-s without completely fucking up the system. Complaining about it the way he did indicates that he doesn't really understand why things are the way they are.

[–] DreamlandLividity@lemmy.world 2 points 6 days ago* (last edited 6 days ago) (1 children)

Another accusation Elon made was that payments are going to people missing SSNs.

~~A much simpler answer is that~~ not all Americans actually have an SSN. The Amish for example have religious objections towards insurance, so they were allowed to opt out from social security and therefore don't get an SSN.

[–] lovely_reader@lemmy.world 3 points 6 days ago (1 children)

It's true that some Americans don't have Social Security numbers, but those Americans can't collect Social Security benefits unless/until they get one.

[–] DreamlandLividity@lemmy.world 2 points 6 days ago

My bad, I thought it was about payments in general (including other programs) but it says social security database. Sorry.

[–] nednobbins@lemm.ee 15 points 6 days ago* (last edited 6 days ago) (1 children)

It’s so basic that documentation is completely unnecessary.

“De-duping” could mean multiple things, depending on what you mean by “duplicate”.

It could mean that the entire row of some table is the same. But that has nothing to do with the kind of fraud he’s talking about. Two people with the same SSN but different names wouldn’t be duplicates by that definition, so “de-duping” wouldn’t remove it.

It can also mean that a certain value shows up more than once (eg just the SSN). But that’s something you often want in database systems. A transaction log of SSN contributions would likely have that SSN repeated hundreds of times. It has nothing to do with fraud, it’s just how you record that the same account has multiple contributions.

A database system as large as the SSA has needs to deal with all kinds of variations in data (misspellings, abbreviations, moves, siblings, common names, etc). Something as simplistic as “no dupes anywhere” would break immediately.

[–] MathiasTCK@lemmy.world 9 points 6 days ago (2 children)

SSN is also not a valid unique key, there have been situations with multiple people issued the same SSN:

https://en.wikipedia.org/wiki/Social_Security_number

[–] nednobbins@lemm.ee 2 points 5 days ago

Yeah. And the fix for that has nothing to do with "de-duping" as a database operation either.

The main components would probably be:

  1. Decide on a new scheme (with more digits)
  2. Create a mapping from the old scheme to the new scheme. (that's where existing duplicates would get removed)
  3. Let people use both during some transition period, after which the old one isn't valid any more.
  4. Decide when you're going to stop issuing old SSNs and only issue new ones to people born after some date.

There's a lot of complication in each of those steps but none of them are particularly dependant on "de-duped" databases.

[–] DacoTaco@lemmy.world 2 points 6 days ago (1 children)

Just read the format of the us ssn in that wikipedia. That wasnt a smart format to use lol. Only supports 99*999 ( +/- 100k ) people per area code. No wonder numbers are reused.
In some countries its birthday+sequence number encoded with gender+checksum and that has been working since the 80's.
Before that was a different number, but it wasnt future proof like the us ssn so we migrated away in the 80's :')

[–] Wispy2891@lemmy.world 3 points 6 days ago (1 children)

In my country the only way that someone has the same number is if someone was born on the same day (±1 century), in the same city and has the same name and family name. Is extremely difficult to have duplicates in that way (exception: immigrants, because the "city code" is the same for the whole foreign country, so it's not impossible that there are two Ananya Gupta born on the same day in the whole India)

[–] DacoTaco@lemmy.world 2 points 6 days ago* (last edited 6 days ago)

Oh ye, our system wouldnt fit india as its limited to 500 births a day ( sequence is 3, digits and depending if its even or uneven describes your gender ). Your system seems fine to me and beats the us system hands down haha

[–] Garlicsquash@lemmings.world 15 points 6 days ago (1 children)

Having never seen the database schema myself, my read is that the SSN is used as a primary key in one table, and many other tables likely use that as a foreign key. He probably doesn't understand that foreign keys are used as links and should not be de-duplicated, as that breaks the key relationship in a relational database. As others have mentioned, even in the main table there are probably reused or updated SSNs that would then be multiple rows that have timestamps and/or Boolean flags for current/expired.

[–] werefreeatlast@lemmy.world 5 points 6 days ago

Is this is true, then by this time we are all fucked. Like Monday someone checks their banking or retirement and it all gone. That's gonna be a crazy day.

I hope they're not using the actual SSN as the primary key. I hope its a big ass number that is otherwise unrelated.

[–] RabbitBBQ@lemmy.world 17 points 6 days ago* (last edited 6 days ago) (1 children)

It's more than just SQL. Social Security Numbers can be re-used over time. It is not a unique identifier by itself.

[–] KillingTimeItself@lemmy.dbzer0.com 1 points 6 days ago (1 children)

i've heard conflicting reports on this, i have no idea to what degree this is true, but i would be cautious about making this statement unless you demonstrate it somehow.

[–] DacoTaco@lemmy.world 1 points 6 days ago* (last edited 6 days ago) (1 children)

As read on wikipedia ( https://en.wikipedia.org/wiki/Social_Security_number ) the format only allows +/- 100k numbers per area code ( which is also limited to 999 codes? ), so over time you are forced to reuse some codes. In total the format allows 99m unique codes, and the us currently has 334mil people sooooo :')

[–] KillingTimeItself@lemmy.dbzer0.com 1 points 5 days ago (1 children)

On June 25, 2011, the Social Security Administration changed the SSN assignment process to "SSN randomization",[36] which did the following:

The Social Security Administration does not reuse Social Security numbers. It has issued over 450 million since the start of the program, about 5.5 million per year. It says it has enough to last several generations without reuse and without changing the number of digits. https://www.ssa.gov/history/hfaq.html

evidently they must be doing something else on the backend for this to be working, assuming there are quite literally 100M numbers, which is going to be static due to math, obviously, but they clearly can't be reassigning numbers to 3 people on average at any given time, without some sort of external mechanism.

There are approximately 420 million numbers available for assignment.

https://www.ssa.gov/employer/randomization.html

that certainly doesnt seem like it would support several generations, possibly at our current birth rate i suppose.

DDG AI bullshit tells me that there are a billion codes. https://www.marketplace.org/2023/03/10/will-we-ever-run-out-of-social-security-numbers/ this article says it's 1 billion

https://www.ssn-verify.com/how-many-ssns

this website also lists it as approximately 1 billion.

[–] DacoTaco@lemmy.world 2 points 5 days ago (1 children)

I think i see the change. They are mentioning the ssn is 9 numbers long, which is 1 longer than the 3-3-2 format wikipedia mentions. That does mean its around 999mil numbers, which ye allows for a few generations ( like, 1 or 2 lol )

yeah, that sounds about right, ok i think we've figured this one out now. lol

[–] KillingTimeItself@lemmy.dbzer0.com 10 points 6 days ago* (last edited 6 days ago) (1 children)

TL;DR de-deuplication in that form is used to refer a technique where you reference two different pieces of data in the file system, with one single piece of data on the drive, the intention being to optimize file storage size, and minimize fragmentation.

You can imagine this would be very useful when taking backups for instance, we call this a "Copy on Write" approach, since generally it works by copying the existing file to a second reference point, where you can then add an edit on top of the original file, while retaining 100% of the original file size, and both copies of the file (its more complicated than this obviously, but you get the idea)

now just to be clear, if you did implement this into a DB, which you could do fairly trivially, this would change nothing about how the DB operates, it wouldn't remove "duplicates" it would only coalesce duplicate data into one single tree to optimize disk usage. I have no clue what elon thinks it does.

The problem here, as a non programmer, is that i don't understand why you would ever de-duplicate a database. Maybe there's a reason to do it, but i genuinely cannot think of a single instance where you would want to delete one entry, and replace it with a reference to another, or what elon is implying here (remove "duplicate" entries, however that's supposed to work)

Elon doesn't know what "de-duplication" is, and i don't know why you would ever want that in a DB, seems like a really good way to explode everything,

[–] valtia@lemmy.world 2 points 6 days ago (2 children)

i genuinely cannot think of a single instance where you would want to delete one entry, and replace it with a reference to another

Well, there's not always a benefit to keeping historical data. Sometimes you only want the most up-to-date information in a particular table or database, so you'd just update the row (replace). It depends on the use case of a given table.

what elon is implying here (remove “duplicate” entries, however that’s supposed to work)

Elon believes that each row in a table should be unique based on the SSN only, so a given SSN should appear only once with the person's name and details on it. Yes, it's an extremely dumb idea, but he's a famously stupid person.

[–] KillingTimeItself@lemmy.dbzer0.com 1 points 5 days ago (1 children)

Well, there’s not always a benefit to keeping historical data. Sometimes you only want the most up-to-date information in a particular table or database, so you’d just update the row (replace). It depends on the use case of a given table.

in this case you would just overwrite the existing row, you wouldn't use de-duplication because it would do the opposite of what you wanted in that case. Maybe even use historical backups or CoW to retain that kind of data.

Elon believes that each row in a table should be unique based on the SSN only, so a given SSN should appear only once with the person’s name and details on it. Yes, it’s an extremely dumb idea, but he’s a famously stupid person.

and naturally, he doesn't know what the term "de-duplication" means. Definitionally, the actual identity of the person MUST be unique, otherwise you're going to somehow return two rows, when you call one, which is functionally impossible given how a DB is designed.

[–] valtia@lemmy.world 1 points 5 days ago (1 children)

in this case you would just overwrite the existing row, you wouldn’t use de-duplication because it would do the opposite of what you wanted in that case.

... That's what I said, you'd just update the row, i.e. replace the existing data, i.e. overwrite what's already there

Definitionally, the actual identity of the person MUST be unique, otherwise you’re going to somehow return two rows, when you call one, which is functionally impossible given how a DB is designed.

... I don't think you understand how modern databases are designed

… That’s what I said, you’d just update the row, i.e. replace the existing data, i.e. overwrite what’s already there

u were talking about not keeping historical data, which is one of the proposed reasons you would have "duplicate" entries, i was just clarifying that.

… I don’t think you understand how modern databases are designed

it's my understanding that when it comes to storing data that it shouldn't be possible to have two independent stores of the exact same thing, in two separate places, you could have duplicate data entries, but that's irrelevant to the discussion of de-duplication aside from data consolidation. Which i don't imagine is an intended usecase for a DB. Considering that you literally already have one identical entry. Of course you could simply make it non identical, that goes without saying.

Also, we're talking about the DB used for the social security database, not fucking tigerbeetle.

[–] DacoTaco@lemmy.world 1 points 6 days ago* (last edited 5 days ago) (1 children)

Ssn being unique isnt a dumb idea, its a very smart idea, but due to the us ssn format its impossible to do. Hence to implement the idea you need to change the ssn format so it is unique before then.

Also, elons remark is stupid as is. Im sure the row has a unique id, even if its just a rowid column.

[–] KillingTimeItself@lemmy.dbzer0.com 1 points 5 days ago* (last edited 5 days ago)

Also, elons remark is stupid as is. Im sure the row has a unique id, even if its just a rowid column.

even then, i wonder if there's some sort of "row hash function" that takes a hash of all the data in a single entry, and generates a universally unique hash of that entry, as a form of "global id"

[–] rational_lib@lemmy.world 12 points 6 days ago* (last edited 6 days ago) (3 children)

To me I'm not really sure what his reply even means. I think it's some attempt at a joke (because of course the government uses SQL), but I figure the joke can be broken down into two potential jokes that fail for different, embarrassing reasons:

Interpretation 1: The government is so advanced it doesn't use SQL - This interpretation is unlikely given that Elon is trying to portray the government as in need of reform. But it would make more sense if coming from a NoSQL type who thinks SQL needs to be removed from everywhere. NoSQL Guy is someone many software devs are familiar with who takes the sometimes-good idea of avoiding SQL and takes it way too far. Elon being NoSQL Guy would be dumb, but not as dumb as the more likely interpretation #2.

Interpretation 2: The government is so backward it doesn't use SQL - I think this is the more likely interpretation as it would be consistent with Elon's ideology, but it really falls flat because SQL is far from being cutting-edge. There has kind of been a trend of moving away from SQL (with considerable controversy) over the last 10 years or so and it's really surprising that Elon seems completely unaware of that.

[–] dnick@sh.itjust.works 2 points 6 days ago

My guess is that he thinks SQL is an app or implementation like MS-SQL. It would be pretty surprising if the government didn't use SQL as in relational databases, but if it doesn't it's even more unlikely that he understands even the first part over whether having duplicate SS numbers is in any way unexpected or unreasonable. Most likely one of the junior devs somewhere along the lines misunderstood a query and said something uninformed and mocking, and he took that as a good dig to toss into a tweet.

[–] DahGangalang@infosec.pub 2 points 6 days ago

Thanks for genuine response. Lol, most who interpret my question that way you did don't seem interested in a good faith discussion. But ol' boy is def tripping if he thinks SQL isn't used in the government.

Big thing I'm intending to pry at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn't understand how SQL works).

it's probably using some sort of proprietary home grown database, because it's probably old enough that no database could support what they needed, could be wrong on that one, but it was my best guess.

[–] TransSynthesist@lemmy.blahaj.zone 11 points 6 days ago (2 children)

139 comments and no one addresses his use of a slur.

[–] localhost443@discuss.tchncs.de 12 points 6 days ago

Because that's really just to be expected at this point, and what his audience would want..

Better to focus on constantly poking at him for being dumb, which he and his fans hate, rather than give them what they want, ie being upset at their hateful language

it seems that nobody really cares about the word retard anymore, it's quite funny how it went from super common language, to being less common, to people just saying it again now.

I'm curious how many people actually consider the word a slur, and how many people even care these days.

If there are timestamped records for things like name changes then you'd get "duped" SSNs

Billionaires are stealing our dollars, tax or otherwise.

[–] TangoNoir@lemm.ee 2 points 6 days ago

He's just a permanent petulant child.

[–] 9point6@lemmy.world 289 points 1 week ago (22 children)

The statement "this [guy] thinks the government uses SQL" demonstrates a complete and total lack of knowledge as to what SQL even is. Every government on the planet makes extensive and well documented use of it.

The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.

If he knew the domain, he would know this isn't an issue. If he knew the technology he would be able to see the constraint and following investigation, reach the conclusion that it's not an issue.

The man continues to be a malignant moron

load more comments (22 replies)
load more comments
view more: next ›