Reputation
Badges 1
72 × Eureka!thanks @<1523701070390366208:profile|CostlyOstrich36>
I've done this successfully using the API already
as for the sdk option - in which format should I provide the list of tasks/projects to the sdk?
foronly_fields=["id", "name","created","status_changed","status", "user"], :
output example
{'id': '02a3f5929cf246138994c9243a692219', 'name': 'docfm_v7_safe_32gpu80g_11Jan24_4w', 'created': datetime.datetime(2024, 1, 11, 9, 54, 33, 406000, tzinfo=tzutc()), 'status_changed': dateti...
I know the 500 limit and using it
but my while loop keeps pulling the same 500 ... and running endless
offset = 0
limit = 500
all_data = []
while True:
params = {
'offset': offset,
'limit': limit
}
response = requests.get(url, headers=headers, params=params,verify=False)
data = response.json()
projects = data['data']['projects']
print(f"pulled {len(projects)} projects.")
if len(projects) == 0:
print("no project found - exiting ...")
break
all_data.extend(projects)
offset += limit
print(f"pulled {le...
not sure it's same use case but I will begin to ask around people
if you have any other hint/way how to query mongo and look for potential culprit - will be glad to hear
so I have large json, with list of task id's
which I want to delete in bulk
API is doable
how about the sdk? how do I provide a list of tasks id's to for deletion
from the cleanup example:
for task in tasks:
try:
deleted_task = Task.get_task(task_id=task.id)
deleted_task.delete(
how do I set tasks , while coming from known list of task id's
@<1523701435869433856:profile|SmugDolphin23> working! here is what I have on Fedora/RHEL
- copy certs to
/etc/pki/ca-trust/source/anchors/ update-ca-trust
ohhh severe error here 🙂
I was mixed between other API I worked on .. and did not read carefully the right flag
simple adding page to the body did the work
thanks again @<1724235687256920064:profile|LonelyFly9>
@<1523701087100473344:profile|SuccessfulKoala55> seen this with 1.13.2
worth to try and upgrade to latest?
ok, hopefully someone will share some thoughts and how it went 🙂
I have tried some small task only uploads single file
logger = task.get_logger()
img = Image.open(f"./1_model.png").convert("RGB")
logger.report_image(title=f"cfg_0", series="Model", iteration=1, image=img)
ended with:
Retrying (Retry(total=0, connect=5, read=5, redirect=5, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)'))': /
202...
I have another instance with clearml-server 1.7 and I got same behavior
as I missing anything? I was under the assumption that jobs with same project/task names should be overwritten and not duplicated
Hey @<1688125253085040640:profile|DepravedCrow61>
should I open issue to follow this up?
we are seeing this bug in almost every task
looking into ES index events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b
docs.count docs.deleted store.size pri.store.size
2118131043 29352476 265.1gb 265.1gb
sounds we're hitting some ES limitation?
to be honest, the use case is mostly convenience
when people train ~5000+ experiments, all saved in few sub folders with long string as experiment name
before publishing a paper for example, we want to move copy small numbers of successful training to separate location and share it with other colleagues/management
I'd guess the alternative can be
changing the name of the successful training under the existing sub folder
using move instead of clone
anything else?
Hi VivaciousPenguin66
thanks for sharing, giving it a try now
after you set up webserver to point to 443 with HTTPS, what have you done with rest of http services clearml is using?
does weberver with 8080 remained accessible and your are directing to it in your ~clearml.conf ?
what about apiserver and file server? (8008 & 8081)
not really - I can try to run these in parallel
{
"id": "c2e33466fd9de1a0d2be8d803ad4fbed",
"company": "d1bd92a3b039400cbafc60a7a5b1e52b",
"name": "John Doe",
"family_name": "John",
"given_name": "Doe",
"created": "2024-09-24T06:13:59.956000+00:00"
}
this is how the object looks like
I think there are some experiments that are messing up mongodb
this logs unusual in clearml-mongo logs:
{"t":{"$date":"2023-09-19T12:15:50.685+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn73","msg":"Slow query","attr":{"type":"command","ns":"backend.model","command":{"distinct":"model","key":"project","query":{"$and":[{"$or":[{"company":{"$in":["d1bd92a3b039400cbafc60a7a5b1e52b",null,""]}},{"company":{"$exists":false}}]},{"user":{"$in":["197aea8467d3f471fc0db98b57ed80fa"]...
for some reason it's not in REST API docs, but I usedusers.get_all
@<1523701070390366208:profile|CostlyOstrich36> unfortunately, this is not the behavior we are seeing
same exact issue happen tonight
on epoch number 53 ClearML were shut down, the job did not continue to epoch 54 and eventually got killed with watchdog timer