Reputation
Badges 1
36 × Eureka!Hey Martin, I will, but it's a bit more tricky because we have modifications in the code that I have to merge on our side
Hi Martin, thanks a lot for looking into this so quickly. Will you let me know the version number once it's pushed? Thanks!
any timeline on this that you are aware of?
no requests are being served as in there is no traffic indeed
so i still can't figure out what sets the task status to aborted
alright, so actually we noticed that the problem disappears if we use only sync requests. Meaning if I create a sleep endpoint that is async we get the 502 but if it's sync we don't
was allow_archived removed from Task.query_tasks?
hey Marin real quick actually, on your update to the requirements.txt file isn't that constraint on fastapi inconsistent?
we have tried both and got the same issue (gunicorn vs uvcorn).
No I meant creating a
@router.post(
"/sleep",
tags=["temp"],
response_description="Return HTTP Status Code 200 (OK)",
status_code=status.HTTP_200_OK,
response_model=TestResponse,
)
# def here instead of async def
def post_sleep(time_sleep: float) -> TestResponse:
""" """
time.sleep(time_sleep)
return TestResponse(status="OK")
so they ping the werb server?
how can you be >= 0.109.1 and lower than 0.96
Hi Martin,
- Actually we are using ALB with a 30 seconds timeout
- we do not have GPUs instances
- docker version 1.3.0
We put back the additional changes and so far it seems that this has solved our issue. Thanks a lot for the quick turnaround on this.
ACtually the request are never registered to the gunicorn app, and the ALB log show that there is no response from the target "-".
ok so I haven't looked at the latest changes after the sync this morning but the ones we put in yesterday seems to have fixed the issue, the service is still running this morning at least.
I will actually write here what I found. trigger_on_tags and trigger_required are actually the same and concatenated with OR. You need to make sure you are using the "__$all" before if that's the behavior you want.
there is a bug in my opinion on the deserialization process because the triggers get de-dupped by trigger name or when using trigger_project there are dozens of triggers being created with the same name (one per dataset in the project). This leads to random behavior dependi...
that's a fair point. Actually we have switched from using siege because we believe it is causing the issues and are using Locust now instead. We have been running for days at the same rate and don't see any errors being reported...
Hey tahnks a lot Alex, that's exactly what I was looking for. cheers
ok great I ll check what other changes we have missed yesterday
Hi @<1523701087100473344:profile|SuccessfulKoala55> ,
I'm running in almost the same error (see below) but I want to connect the the free clearml server version at None so I have set up the corresponding env variables in example.env:
CLEARML_WEB_HOST="
"
CLEARML_API_HOST="
"
CLEARML_FILES_HOST="
"
CLEARML_API_ACCESS_KEY="---"
CLEARML_API_SECRET_KEY="---"
CLEARML_SERVING_TASK_ID="---"
I have set up the right values from...
Geez, I have been looking for this for a while, thanks for saving my day...again.
yeah I don't know I think we are probably just trying to fit to high a throughput for that box but it's weird that the packet just get dropped i would have assumed the response time should degrade and requests be queued.
what is actually setting the task status to Aborted ?
This being said, now I'm running into another issue that this seems to be "erasing" all the packages that had been set in the base task I'm cloning from. I can't find a method that would return these packages so that I could add to it?
I have tested with an endpoint that basically add two numbers and never managed to trigger the 502. I'm starting to wonder if we are not running just too many workers. I had it wrong that 2 vcpus should mean 5 workers should be good but I think i should probably be closer to 2 but I m not sure why that would lead requests being dropped
I'm assuming that task.data.script.requirements is not the right way to do this...
