Wednesday, May 7, 2014

Personal Notes On Computer Science Skills [Unofficial]

Computer Science Skillsets I Am Focusing On
  • Skillset 1: Software Engineering
    • Programming Languages [1] [2]
    • Software Engineering skills, tools and processes [3] [4] [5] [6]
  • Skillset 2: Theory & Algorithms 
    • Theoretical Computer Science
      • Algorithms [1] [2]
      • Data Structures
      • Language, Automata & Discrete Mathematics
    • Computational Science & Engineering
  • Skillset 3: Systems
    • Systems Programming
    • Computer Security
    • Database: RBBMS and NoSQLs
    • Cloud and Mobile Development [1] [2]
    • Parallel, Multicore and Concurrent Programming [3]
    • Networked and Distributed Systems Programming [4] [5] [6] [7]
  • Skillset 4: Intelligence & Data 
    • Machine Learning [1]
    • Data Science & Analytics, Big Data 
    • Artificial Intelligence [2]
  • Skillset 5: Physical Digital [1] [5] 
    • Robotics & Manufacturing [2] [3] [4] [6]
    • Internet Of Things
  • Skillset 6: Info Bio 
    • Computational Biology & Bioinformatics
    • Systems BIology [1] 
  • Skillset 7: Interactive Computing
    • Education [1] [2] 
    • Wikinomics [3] [4]

What is the least you need to know so that you can develop any software?


Programming language: structure the main syntactic elements


Algorithms and Data Structures
  • Structure all the data structures
Software Development, OOA&D, Design Patterns, Functional Programming, Software Engineering Tools


Can you develop prototypes from scratch?
Master existing systems - interacting with them, their internals, how they could be made to work better.
Compilers
Operating Systems
Database: Relational, NoSQL
Networking
Parallel & Distributed Computing


Computer Architecture


Machine Learning
Language Processing


Multicore, Networked & Distributed Programming




How do you make developers orders of magnitude (say 100 times) more productive?
How do you develop software so that they have less bugs?
  • Support for Abstractions, Components in language
  • Start with Scala DSL, Clojure Macros, and Code generation (from XML / DSL - synchronization with code, CLI / Interpreter).
  • Library, Framework, Plugin, Middleware, Reusable infrastructure (e.g., Akka)
  • Classify, Categorize all the different bugs, Test Driven Design


Programming Language Paradigms
  1. Imperative, State (Memory) manipulation based, Assignment oriented Programming Language
  2. Functional Programming Language
  3. Object Oriented Programming Language
  4. Rule-based, Logic-based Programming Language


Most languages have hybrid philosophy.
Scala - Object Functional Programming Language


Blogposts
  • Functional Programming
  • What makes Clojure different
  • Scala
  • Dependency Injection
  • Parallel Programming Models



Concurrent Programming Models


Parallel Programming: Multiple-processor or multicore programming


Concurrent Programming: Multithreaded programming


Concurrent Programming Models:
  1. Threads and lock-based Synchronization
    1. Java
  2. Functional programming model
    1. Pure Functions with no side-effects + Immutable Data Structures
  3. STM (Software Transactional Memory)
    1. Clojure
  4. Actor based message passing
    1. Scala
    2. Erlang
    3. Clojure
  5. Channel based message passing:
    1. Go
    2. Unix processes-pipes
  6. Non-blocking I/O or asynchronous I/O
    1. Callback Functions
      1. Node.JS: Event driven programming and Callback Functions.



Programming Language
Components, Composition of Components, Parameterization



Functional Programming


First class functions
which means functions are first-class citizens.
  • functions can be assigned to variables
  • functions can be stored in data structures
  • functions can be passed to functions as arguments
  • functions can be returned from functions


Pure functions without any side-effect
  • Functions take values as parameters and return values.
  • No global or mutable state.


What do these two features lead to
  1. Localized thinking space
  2. Localized testing
  3. Control abstraction with higher order functions
  4. More readable & shorter code
    1. Higher order Functions leads to less branches and assignments, which in turn leads to readable, shorter code.
  5. Data abstraction with closures
  6. Concurrency - immutable data structures
  7. Simplifies programming; No need for complicated OO Design Patterns, which are required to solve problems that OO introduces.
  8. Higher order Functions, Composition of function at runtime, Function Types, Anonymous Functions / Function literals, Partial Functions, Currying,
  9. Distributed programming - map-reduce
  10. Loose coupling?
  11. Reusable? Replaceable components?
  12. Flexible? Extendable?
  13. Real world modeling? Functions take in data, and return data
  14. Recursion. Memoization. Dynamic Programming. Immutability.
  15. Monads
  16. Pattern Matching

(OOP - interacting objects)


Function returning output”s”?


C++ Function pointers, function objects





Scalable Web Applications


Mobile (HTML5)
Cloud (Big Data, NoSQL)
Social
Security


Programming Languages I Am Learning / Working With
Language - Why
C/C++ - Systems Programming. Efficiency.
Go - Systems, Concurrent and Networked Programming. Static Typing. Faster compilation.
Java - Managed Code. Android Development. Open Source Libraries and Frameworks.
Scala - A blend of all the features you saw in different languages. OOP and Functional Programming on JVM. Static Typing. Terse syntax.
Clojure - Lisp Features on JVM. Metaprogramming (Programmable Language). Parallel Programming.
Python - Rapid Development. Open Source Libraries and Frameworks.
JavaScript - Web Front-end. Object-based Programming. Node.JS.
R - Statistical, Numerical Computing
Haskell - Pure Functional Programming
Erlang - Fault tolerant Parallel Distributed Programming

Language philosophy
Language features - Sentences, Control, Abstractions


C/C++
Memory manipulation. Writing Systems Software. Efficiency.
Generic Programming.
Figure out how STL is implemented so that you can implement on your own.
Boost Library.



Virtualization


Computer Security


Physical Digital
  1. Systems, Linux, Drivers, Modules, Android,
  2. Image Processing, Computer Vision; Signal Processing; Robotics; Machine Learning; Knowledge Representation, Probabilistic Agents, Planning; Semantic Web, Networking;
  3. Computer Architecture, Microprocessor; Arduino / Raspberry
  4. Sensors
  5. MEMS
  6. Electronics; Control Engineering; Machines;
  7. Security
  8. Physics



Scala
  1. Akka
  2. Play
  3. Spray
  4. Spark, Spark Streaming, MLlib, GraphX, Shark
  5. Storm
  6. Scalaz
  7. Lift
  8. Stratosphere


Clojure
  1. Incanter
  2. Ring, Compojure
  3. core.logic

OOAD
OO Design Patterns:
  1. Strategy
  2. Factory
  3. Dependency Injection
  4. Publisher-subscriber
  5. Service Oriented Architecture

Cross Language Development
  1. Thrift
  2. JVM


Java


Java Language & Standard Library:
  1. Language features: Class, Object; Garbage Collection; Inheritance, Polymorphism, Interface; Nested Type; Package; Assertion; Annotation; Generics; Enum; Exception
  2. Data Manipulation API: Math; Random number; BigDecimal; BigInteger; Geometry; String, Character, Regular  Expression; Primitive Wrapper; Array; Collections; XML Processing;  
  3. Development API: Internationalization; Preferences; References; Reflections; JMX
  4. Systems and Network Programming API: System; GUI, Swing, AWT; Threading (Thread & Lock); Concurrency; Networking - Protocols; Servlet, JSP; Web Services; File; JDBC;

Java Vital Techniques:
  1. Concurrency
  2. Dependency Injection, IoC
  3. AOP
    1. AspectJ
  4. Modular Java
    1. OSGi
  5. Classfiles & Bytecodes
  6. Performance Tuning


Java Libraries & Frameworks
  1. Spring
  2. Android
  3. Play
  4. Hadoop

Java Software Development Tools:
  1. Automation; Increased productivity
  2. Testing
    1. Unit Testing
      1. JUnit
    2. Integration, Functional, Load, Performance Testing
  3. Build
    1. Maven
  4. Continuous Integration
    1. Jenkins
  5. Version Control:
    1. Git
  6. Quality Metrics
  7. Issue Management
    1. Bugzilla
  8. Technical Documentation Tools



Java Standard Library
  • String Handling: String, StringBuilder, StringBuffer, String.split() (StringTokenizer), Text, Character, Pattern, Matcher
  • Collections: Map, List, PriorityQueue, Set, Queue, Arrays, Collections,  
  • Math: Math, BigInteger, BigDecimal,
  • I/O: Scanner, File,   
  • Exception Handling:


Mobile Development
  1. Android Application Development
  2. Android Internals
  3. Mobile Web


JavaScript
  1. jQuery
  2. Bootstrap
  3. AngularJS
  4. Node.JS


HTML5
CSS3


Multicore, Parallel, Networked & Distributed Programming


Parallel Programming
  1. GPU Programming


Cloud Computing
  1. Google App Engine
  2. OpenStack
  3. AWS: Basics

Network Programming


Protocols:
  1. TCP, UDP; TLS, SSL; HTTP, SMTP, POP, IMAP, FTP; RMI;


Server Architecture


Web Framework:
  1. Django
  2. Spring MVC
  3. Play
  4. Ring, Compojure


Scaling Internet Applications:
  1. Cache
    1. Memcached
  2. Message Queue
    1. AMPQ
    2. RabbitMQ
  3. Task Queue
    1. Celery
  4. Mapreduce
    1. Hadoop


Virtualization

Distributed Data Processing Framework
  1. Mapreduce Framework
    1. Apache Hadoop
      1. Hadoop Family Projects
        1. YARN
        2. Hive
        3. Pig
        4. HBase
        5. Sqoop (Hadoop<data>RDBMS)
        6. Zookeeper
        7. Impala
        8. Accumulo
  2. Bulk Synchronous Parallel Model
    1. Apache Hama
  3. Pregel
    1. Apache Giraph
      1. Bulk Synchronous Parallel Computations for processing Graphs  
  4. Stratosphere
  5. Spark
    1. Spark Streaming, MLlib, GraphX, Shark
  6. Storm
  7. Percolator
  8. Dremel

Artificial Intelligence
  1. Language Processing
    1. Statistical
      1. Information Retrieval & Web Search
        1. Lucene, Nutch, Solr
      2. Information Extraction
    2. Natural
      1. Parsing
  2. Adversarial Search
  3. Constraint Satisfaction
  4. First Order Logic
  5. Planning
  6. Probabilistic Models: Bayes, Markov


Data Science & Analytics


Data Mining
  1. Data Warehouse & OLAP
  2. ETL


Big Data
  1. Hadoop


Machine Learning
  1. Inductive Learning
    1. Decision Tree
    2. Ensemble Learning
  2. Statistical Learning
    1. Bayes
    2. Neural Network
      1. Deep Learning
    3. SVM
  3. Knowledge Based Learning
    1. Explanation Based
    2. Relevance Based
    3. Inductive Logic Programming
  4. Reinforcement Learning
  5. Computational Learning Theory
  6. Machine Learning Library
    1. Mahout
    2. scikit-learn


Numerical Algorithm


Database


RDBMS:
  1. MySQL
  2. SQLite


NoSQL:
    1. Mongodb (Document-oriented)
    2. Couchdb (Document-oriented)
    3. Cassandra (Distributed)
    4. Redis (In-memory)
    5. Riak (Dynamo, key-value)
    6. HBase (Bigtable)
    7. Neo4j (Graph)
  1. Semantic Web, linked data

Operating system (& Computer Architecture)
  1. Linux
    1. Loadable kernel modules
    2. Device Drivers <> Peripherals and Interfacing
  2. Android Internals

Algorithms and Data Structures


Data Structures
  • Structure all the data structures
  • Linear: Stack, Queue, Linked List, Heap, Binary Indexed Tree, RMQ
  • Tabular: Hashtable
  • Non-Linear: Rooted Trees, Binary Search Trees, Red-Black Trees, B-Trees


Algorithms
  • Computational Complexity
  • Dynamic Programming
  • Graph Algorithms
    • BFS
    • DFS
    • Topological Sorting
    • Strongly Connected Components
    • Minimum Spanning Tree
      • Prim
      • Kruskal
    • Single source shortest Path
      • Dijkstra
    • All Pairs Shortest Paths
      • Floyd Warshall
    • Maximum Flow
      • Maximum Bipartite Matching
  • Backtracking
  • Greedy Algorithms
  • Divide & Conquer
  • Number Theoretic Algorithms
  • Computational Geometry
  • Matrix Algorithms
  • Algorithmic Game Theory
  • Linear Programming
  • Topcoder Algorithm Tutorials


Mathematics & Theory
  1. Combinatorics
  2. Probability
  3. Number Theory
  4. Linear Algebra
  5. Graph Theory
  6. Automata Theory

Software Engineering


Processes:
  1. Agile


Tools:
  1. Automation, Increased productivity
  2. Testing
    1. Unit Testing
    2. Integration, Functional, Load, Performance Testing
  3. Build & Continuous Integration
    1. Maven
  4. Version Control:
    1. Git
  5. Quality Metrics
  6. Issue Management


Computer Architecture

  1. Special purpose processors: GPU, DSP, SIMD, FPGA



References

No comments:

Post a Comment