博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
设计题
阅读量:6567 次
发布时间:2019-06-24

本文共 2500 字,大约阅读时间需要 8 分钟。

1. 一个有10亿条记录的文本文件,已按照关键字排好序存储,设计算法,可以快速的从文件中查找指定关键字的记录。

$10亿=10^9 \approx 2^{30}$,每行记录如果是1kB的话,总共是1TB。将文件分割成1000份,每份1G,load进内存作二分查找即可。

2. 设计一个分布式爬虫系统。

配置参数: start_url, 爬取的深度, update的频率.

功能: 定时爬取更新, 去重, 检索; 是否支持规则;

问题: 分布式存储, 怎么去重, 磁盘io和网络io; 重爬. 数据失效后,更新索引;

一开始要估计好量吧,比如一个页面有100个链接,4层的话就有100^4,每个页面是100kB的话,每次爬取就可能有10TB数据. 怎么去重. 假设有50%去重了,也就是5TB.

假设有20%需要定时更新,那么update的量就有1TB.

http://blog.sina.com.cn/s/blog_59c4ac5501017wda.html

http://www.douban.com/group/topic/38361104/

3. 设计一个长连接手机云推送服务。怎么做链接管理(链接中断、链接查找),百万级长连接,怎么做容错。

4. news feeds。

5. 分布式缓存方案。

 

系统设计的时候,我觉得知道以下几点会有好处:

  • 水平扩展和垂直扩展;
  • 多读还是多写;
  • 负载均衡;
  • dns;BIND is by far the most widely used DNS software on the Internet, providing a robust and stable platform on top of which organizations can build distributed computing systems with the knowledge that those systems are fully compliant with published DNS standards.
  • 缓存,以及缓存系统会出现的雪崩现象(一旦缓存失效需要从数据库重新加载数据的时候,大量的并发数据库访问会导致响应超级慢),这里有个不错,双缓存;工作中也只是充当“有很多数据结构”的Memcached来使用。。。(Memcached作为数据库一级缓存,Redis作为业务场景二级缓存)
  • Nginx(发音同engine x);在Linux操作系统下,nginx使用epoll事件模型;
  • 数据恢复;日志是个好帮手;

A load balancer is a device that acts as a reverse proxy and distributes network or application traffic across a number of servers. Load balancers are used to increase capacity (concurrent users) and reliability of applications. They improve the overall performance of applications by decreasing the burden on servers associated with managing and maintaining application and network sessions, as well as by performing application-specific tasks.

Load balancers are generally grouped into two categories: Layer 4 and Layer 7. Layer 4 load balancers act upon data found in network and transport layer protocols (IP, TCP, FTP, UDP). Layer 7 load balancers distribute requests based upon data found in application layer protocols such as HTTP.

Requests are received by both types of load balancers and they are distributed to a particular server based on a configured algorithm. Some industry standard algorithms are:

Round robin

Weighted round robin
Least connections
Least response time
Layer 7 load balancers can further distribute requests based on application specific data such as HTTP headers, cookies, or data within the application message itself, such as the value of a specific parameter.

Load balancers ensure reliability and availability by monitoring the "health" of applications and only sending requests to servers and applications that can respond in a timely manner.

转载于:https://www.cnblogs.com/linyx/p/4018181.html

你可能感兴趣的文章
Centos 6.4 PPTP ×××搭建
查看>>
apache 日志切割
查看>>
2017最新整理传智播客JavaEE第49期 基础就业班
查看>>
MySQL-MMM实现MySQL高可用
查看>>
看菲菲详解如何快速获取linux命令帮助
查看>>
vim 编辑器详解
查看>>
现代软件工程 第十章 【典型用户和场景】 练习与讨论
查看>>
Linux如何编译安装源码包软件
查看>>
MySQL备份类型
查看>>
仿腾讯网的JS图片切换代码
查看>>
升级centos6.6至centos7.2.1511
查看>>
postgresql创建表
查看>>
springMVC参数传递(三)
查看>>
说说Keepalived的脑裂
查看>>
linux 学习总结
查看>>
CentOS6.4下安装xampp
查看>>
shell语法
查看>>
从某次测试过程中,得到的MySQL性能优化的建议,和定位问题的方法
查看>>
JS三大对象中常用方法集锦
查看>>
词汇与分词技术
查看>>