Reputation
Badges 1
42 × Eureka!from clearml import Task task = Task.init(project_name="Inbar2022/LanguageFactoryDanish/lions_test", task_name="lions3")
python main.py --cuda --epoch 1
compare between both of the tasks
SuccessfulKoala55 any clue?
got it, I don't really understand why it happens, quite certain I didn't see this in the past
same basic job not gets overwritten, but created new one every time
okie so this works only if jobs run in parallel
first job create new task id
second job (initiated immediately after first job) do the reuse properly
if I wait for first job to finish - then run again new second job with same name, it will not do reuse
is this expected?
didn't do that test
I usually wait for first job to finish before I start new one
let me dig in more and hopefully can share successful results
thanks!
tried with my user and edited existing user record in apiserver.conf
it looks ClearML treated this as new user - I did not saw any of the jobs belongs to my user before the change
I have tried some small task only uploads single file
logger = task.get_logger()
img = Image.open(f"./1_model.png").convert("RGB")
logger.report_image(title=f"cfg_0", series="Model", iteration=1, image=img)
ended with:
Retrying (Retry(total=0, connect=5, read=5, redirect=5, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)'))': /
202...
@<1523701087100473344:profile|SuccessfulKoala55> looks OK (?)
>>> StorageHelper.get(Task._get_default_session().get_files_server_host())._container.session.verify
InsecureRequestWarning: Certificate verification is disabled! Adding certificate verification is strongly advised. See:
True
in case this will help someone else, I did not had root access to the training machine to add the cert to store
you can point your python to your own CA using:
export CURL_CA_BUNDLE=/path/to/CA.pem
so I think I'm in the right direction
adding verify=
and pointing to my CA.pem looks like the right approach
now, how do I use it with ClearML API?
cleanup_service
for task in tasks:
try:
deleted_task = Task.get_task(task_id=task.id)
print (deleted_task.name)
deleted_task.delete(
delete_artifacts_and_models=True,
skip_models_used_by_other_tasks=True,
raise_on_error=False
)
it throw down the SSL error,...
SDK version: 1.14.4
clearml-server version: Server: 1.14.0-431 • API: 2.28
I think there are some experiments that are messing up mongodb
this logs unusual in clearml-mongo logs:
{"t":{"$date":"2023-09-19T12:15:50.685+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn73","msg":"Slow query","attr":{"type":"command","ns":"backend.model","command":{"distinct":"model","key":"project","query":{"$and":[{"$or":[{"company":{"$in":["d1bd92a3b039400cbafc60a7a5b1e52b",null,""]}},{"company":{"$exists":false}}]},{"user":{"$in":["197aea8467d3f471fc0db98b57ed80fa"]...