For me, I split all the talks I listened this time in QCon, into 3 categories:
Tech Architecture Optimization
Tidy First
Scaling Bottleneck
System Design Concept
Microservices Patterns
Orchestration vs Choreography
System Design Implementation
Adopting Continuous Deployment
Distributed In-Process Cache
Leveraging Determinism
Service Mesh
DynamoDB in Amazon
CosmosDB in Azure
Event Processing Service in Doordash
Blob Storage Service in Dropbox
The categories are ordered from abstraction to concretization. I know most of the engineers like me are more interested in the low level of the implementations. And yes, some talks here(like the Determinism one is really really good), however, this time the first two categories’ talk gave me the biggest influence. I think listening to these talks help me to understand the career path better. Sometimes people have to view things out of the box.
I really appreciate this time my company, Meta, sponsors me for this event.
I spent 3 days at the conference. The first day is more about academic work form professors and PhD students. The other two days are more about best practice in the industry world. The topics among AI, Distributed System, Advanced Tooling, Product Introduction and Personal Experiences.
I find conferences really help people to think out of the box. And I hope I joined Meta earlier and understand our own implementations so that I could have a deeper understanding of the problem and better compare solutions.
Research Topics:
AI content generation:
Generally, it gets more and more popular. However, the idea is already there. The topic is focused on the gaming content generation. Talk about the best practice of that. Talks about setup rules and how can make it generate more “comfortable” contexts for better game experience
Breakbeats analysis
A topic talks about AI works in breakbeats(pieces background music used in songs). How the company could use that to better understand songs evolution and help them to get song’s patent fees
Textual analyze on code
A talk about how people better understand what code from language-side of view is. And how this kind of thinking could provide people with a new way to understand our code pattern and code practice. Eg:
Trace is one pattern for people to understand the story, it is also a way we used to find bugs.
When we do bottom-up implementation, we always start from the central piece in the structure. This is reasonable from the language side
Product Introduction
Google’s universal IPC language: FIDL in fuchsia
A system very like thrift, but it is used in a deeper system, for IPC. Also doing the binding, so it will support different languages. And easy to extend to other languages.
Workflow:
A new way for making a durable system
A practice that brings “transaction” in DB into service codes, easier for people to write concurrent code. It will be easier to do error handling.
The idea is to make actions more atomic. And for the actions that could change the states, make them idempotent
Filibuster:
An End to End test tool, more focus on testing error cases in end-to-end test cases, to help understand big system’s errors
Talk about in a big system, how end to end testing is needed especially for error cases. Those tests need to cover multiple failing situations.
RLBox:
A way to run native code in sandbox to avoid native code’s memory issues
Talk about how native code is unsafe, and how many safe languages are built on them
Try to run all the native code in the sandbox to make it safer.
It is already used in firefox
A way of coding, so it will be very easy and clear to add to exist system
Akita software:
A product/practice to add API listeners for developers to understand old systems.
eBPF mode to add listeners to the APIs in the old system. And their product provides a solution to better understand the system based on those listeners.
Shopify solution for flash sale:
solution in both LB layer and service layer to ensure capacity, starvation and fairness problem
Very good topic to talk about flash sale, talked very detailed about how they choose from different solutions.
Runtime Checker:
Java best practice to enhance system in observation, watchdogs and in prod regression tests
Very good talk about why we need runtime checker in system, and why sometimes even current wide used system could have bugs
Talk about fix them in three topics
In observation, the idea is that using some static code analysis tools, to find those errors handled locally but not reported to any response.
In watchdog, talks about better catch errors, so the errors could report those error into some monitoring system
In tests, define some testing rules to describe the system, and when running the test to verify whether those rules are still valid. (like behaviors testing along the workflows)
WNFS:
a guide on creating a local-based encrypted file system
Talks about how to create a good file system, including files and directory. And how to keep the history and how to merge versions
Also, talks on how to encrypt those files
Coding Best Practice:
Polymorphism:
different language, different solutions for better polymorphism: Overloading, Override, Interface, Parametric, Structure, Union type, ADT, Multiple dispatching…
Large amount user could search for their health code status
Change user’s health code to unhealthy code based on their time and location if they contact with other unhealthy code
Requirement
User’s health code change could happen with certain delay. ( If I contact with a unhealthy code people, I might get update to unhealthy code 1 day after )
Assume user need to show their health code to get into some place, which means that user’s phone is always opening and providing their data to us. ( There will not be if people go to one place and we cannot get the data )
Design
The whole system could be separate into 2(3) parts:
Part that support user check their own code. Heavy read.
Part that collect user’s time and location data. Heavy write.
When time goes by, one certain range of data collected in part-2 will be complete. And according to requirement-2, we can assume all the data in that time range is in the database and no more to write in. That time we could get the data out and put into an offline batch process, where we will find all the unhealthy code’s movement and get all contacted health code user. We could let automatic process or human administrate to determine the final code change.
Summary
The design helps to separate the read, write, and online, offline, so each part could optimize itself.
Recently, my new project is pushed to production. You might think that I would be exciting about it. Because it is the second project that I designed & developed & tested, and it finally pass the review from senior engineers and goes to the production. But I am very exhausted these days about this project. I have to admit I was 100% exiting about it at the beginning, but at the end now, I was not excited at all. So what happened? At beginning, when I get the project, I would say it is a very good project for me. The requirement is mostly focused on system designing, because the old version is already solve the core algorithm, but people are not appreciated for the stability. What I need to enhance is the system. It needs a more stable and scalable structure, so it will fits more about the production situation. So I was very exiting about this “message queue - worker” pattern implement. I draw the design diagram, I worked on the API document, I create the schema of mongo database collections. And because I learnt a lot from the bugs from the first project I worked. So I think I covered all the corner case and it should be perfect. So after the development finished, I start to push it to production. I was behaved like the project owner, I talked with customer success team to understand their needs and requirements and of course the timeline. Then I talked with cloud ops about the deployment requirements. After that I believed the project is good to go. So I push it to the channel to go production. But the things turned ugly at this point. As I described before, this is not a brand new deployment, but a upgrade, though it is a overall upgrade. But apparently our senior engineers are not agree with that. After a week of silence, My new design is heavily challenged. At first I feel it’s OK, because people’s concern is all reasonable. For example, the structure of the message queue. I will say my message queue is well designed, because I carefully review the code from a open source message queue implementation, and I used message queue from my first project, so I was well prepared to explain my design. But their concern is from the other pointer of the view, the reason they wish I change the structure is because they want to apply the auto scaling policy they are using now to fit for my workers. I never thought of that. This indeed is a very good suggestion, so I changed the code and message queue to fit for the standard message queue structure so they could apply the auto scaling policy to it. ( For a little more explain about this, the other service will monitoring the message queue, and if the queue is slow, they will add machine as queue consumer, so a standard queue will help to determine whether the queue is slow or not ) So I took 4 suggestions at this point and go back to enhancement my code. and after two weeks, I push the code again. After that I waited for a week and people agree to proceed to deployment again. And they required a small change at friday afternoon, so I said that I will changed it next Monday. And I did the change next Monday, but cloud team disappear again, I waited two weeks, and they came back requiring stress test and very strict code review. So this project is delayed and delayed, I tried my best to fit for their suggestions but the bar is higher and higher, and I need to wait more on their deployment plan. I was so exhausted on the waiting. And from this point, I was not excited and try to be like the project owner, I went back as a developer, I just tell everyone what I did and waits for requirements from the others.
What I real feels, is that from that point, I changed from a code lover to a code worker. Yes, before I LOVE to code, so I did the job, and I was really happen about the results. But now, I was more and more standardized. I have to say that I learned a lot from the standards and they are very useful. But I lost my love of this project. I appreciate this change made me a better software engineer but I don’t appreciate this project to made me a code-worker. I hope I could get experience and won’t lose my love of coding.
回家我就开始打state of decay游戏,一款病毒丧尸游戏。看着里面的丧尸和注射解药,不知道为什么觉得真的非常的应景。持续打了大概3个小时游戏,当天我强迫自己睡一个午觉。脱衣服的时候惊觉自己的手臂抬起来已经很痛。之后晚上开始看4小时的超长无聊电影。 第二天正常在家工作,干到下午5点,手上活干完了,想着稍微休息一下准备看一下明天的安排,突然发现自己眼皮非常酸,就开始躺床上休息。此时大概是疫苗打后30小时,然后开始发冷,我记得当时我看室外温度是21摄氏,室内我一直开空调是76华氏度也就是25摄氏。但是我必须要盖我冬天盖的毛毯,不然就会觉得冷。睡了一会到7点,丧失皮肤的冷热感觉,开始口干,但是手臂酸痛开始缓解。然后我就开始了喝水上厕所的循环,我的饮水量大概是一般时候的一倍。期间我量过自己口腔体温,36.6并没有发烧,但是窝在被窝里开始发汗,我换了薄被子仍然发汗,总体感觉就像感冒。 然后就是今天,起床时候已经没有发冷和疲惫状态,手臂酸痛更轻了,洗了一个澡后正常上班。到目前没有什么异样。
I bought a NAS, I am going to use it to do backups, in the meantime, enhance the whole networks.
I have an VPN server, I hope I could use that to access home service from outside, instead of using DDNS or static ip.
I wish to open some service from my home networks, eg: vs code server, jupyter notebook server, so that I only need chrome for mobile devices. And for some small projects like this blog project, I could put the code in the server and use one machine to maintain the blog.
Design
Old Design and Problem
The old design is showed above, Solution is to used a server to run ubuntu as a middle router, and it needs to provide:
connection with the outside internet
connection with the VPN server
provide dhcp assignment for inner-network
Speed become the biggest problem of this. I found that OpenVPN has a speed limit here, normally 100Mbps is the limits. That is unacceptable. And the VPN server is not physically near my home, there already lag, but I thought I could do compromise about the lag. But not the speed, never compromise the speed.
New Design and solution
The new design is showed above: The solution is to separate the whole inner-network into three parts:
modem will assign/lease the ips of 192.168.1.xxx, this will be used for most the device to connect to the internet, it is a full speed network.
raspberry pi will assign/lease the ips of 192.168.2.xxx, which will be used to auto connect to VPN, it is a limited speed network.
vpn server will assign/lease ips of 10.8.0.xxx. Inner service will be open to this network.
So, if want to get full speed, choose modem network, if want to get vpn, choose raspberry network. Raspberry pi will forwarding ports to VPN networks, so when connection from VPN, they could access inner-network service. And even though we expose this service, they are still inside the VPN, so it is very safe.
ubuntu useful tools
As we mentioned before, the core of this design is in the raspberry pi. There are many manipulate on the ubuntu server in raspberry. I read a lot to learn this. So I have this part as a reference for next steps, here we will talk about some tools for future use.
crontab
cronis a very common ubuntu command for timer behavior. It provide a trigger, so at a certain time, it could trigger a action. We introduce it here because it has a config option @reboot, which could be used to add some command at the time of the server boot up, this is much easy to create a ubuntu service to do the same thing.
install:
1
apt-get install cron
run this
1
crontab -e
add this line in the file:
1
@reboot Your-Command
screen
This is a very common ubuntu tool for running a long-live service, it will help you to keep a service alive, even when you close the terminal. Here are some common command:
openvpn client: This command will help you to start the client, still you will need to provide configs and password before execution.
1
sudo openvpn --config xx.ovpn --askpass pass.file
ubuntu netplan && network
If you used ubuntu desktop and change to ubuntu server, you will find this netplan is introduced after 16.04(?) and replace of the network. Still you could change back to edit /etc/networks/interfaces with some extra settings, but we will talk about netplan here. netplan provides some configs in different files.
1 2
00-installer-config.yaml 50-cloud-init.yaml
And you can use one command to create combined rules for test run
1
sudo netplan --debug generate
At the end you could apply the rules
1
sudo netplan apply
And each file is a yaml file,there is a example for a static ip connection setting:
The first one means that, when all the connection in the post-routing status, set them to handled by network interface tun0. So that all the network will pass to tun0 to handle. The second one means in the phrase of pre-routing, forwarding all the request for port 21 to port 21 in 192.168.1.21.
And if you want to delete the rule above when you make some mistakes, simply change A to D here
I followed this blog to start, and did some adjustments on my own: Base
Change ip
Set up dhcp server
Config ip transfer
Setting NAT routing
Most of the steps are the works for me in 1-3 steps, and I will need to do some more the step 4.
How I setup my NAT routing rules
Let’s talk about network interfaces first I have three network interfaces, they are represent to three networks, two of them set in netplan, and tun0 is created automatically when vpn connection set up:
eth0 is used for 192.168.1.xxx
tun0 is the vpn, used for 10.8.0.xxx
eth1 is used for 192.168.2.xxx
And my post-routing is very easy, like this:
1 2 3 4 5
Chain POSTROUTING (policy ACCEPT 14213 packets, 1060K bytes) pkts bytes target prot opt in out source destination 39 2702 MASQUERADE all -- * eth0 0.0.0.0/0 192.168.1.0/24 2 96 MASQUERADE all -- * eth1 0.0.0.0/0 192.168.2.0/24 4101 2089K MASQUERADE all -- * tun0 192.168.2.0/24 0.0.0.0/0
Some explains:
For all connection target 192.168.1.xxx they will be change to eth0 to handle
For all connection target 192.168.2.xxx they will be change to eth1 to handle
The rest will go to tun0 First two will used in port forwarding next, the last one is used for inner network connection
Then you could use command like this sudo iptables -t nat -D PREROUTING -p tcp --dport 21 -j DNAT --to-destination 192.168.1.2:21 to do port forwarding. Here I forwarding connection to port 21 to 192.168.1.2
How to save all routing settings persistently
First of all, beside tun0, all the rest settings could saved as the blog talked before using iptables-persistent. Base the only problem is, when updating the rules, I have to reinstall iptables-persistent and saved again, otherwise it wont saved, I dont know whether this is a bug. However, this cannot work for rules related to tun0, because tun0 only exist when the VPN connection is established. So if you saved the operation above, when you do the restarting, it will auto change will is tun0 to eth0.
1 2 3 4 5
Chain POSTROUTING (policy ACCEPT 14213 packets, 1060K bytes) pkts bytes target prot opt in out source destination 39 2702 MASQUERADE all -- * eth0 0.0.0.0/0 192.168.1.0/24 2 96 MASQUERADE all -- * eth1 0.0.0.0/0 192.168.2.0/24 4101 2089K MASQUERADE all -- * eth0 192.168.2.0/24 0.0.0.0/0
So what we will do is to used auto start to start VPN client first, then add the NAT routing rule. The way to do it is to use crontab -e to add a command to do execution.
When you try to setup network, you might lose the connection to the server if you make mistakes. Before this I didnt bought the hdmi cable for raspberry pi, so I have to reset it several times to get connection back. So I highly recommend you put a plan B here when network connection broken, use a hdmi to change the mistake back. Also, when you do some changes for the whole networks, you might still have a wifi connection from the top level. So if connection is error, you still have a connection to the internet from wifi, you can still do some search on the problem. And it will be easy to set it back when you finished.