Reputation
Badges 1
16 × Eureka!VivaciousBadger56 Youāre basically answering yourself š so kedro = lean feature strong community, ClearML many features small (growing) community and mlrun has a good name
Hi, just chiming in with a lesson learnt on my subreddit r/mlops - when shortlisting open-source MLOps infra, the bundled features are less important KPIs than stability and longevity markers:
community adoption active slack channel good documentation clear monetization scheme (how much does it cost if you decide to go SaaS instead of paying for own infra) - even if you never intend to go SaaS, it helps to understand if the OSS is actually āfreemiumā or not.
Hope that helps!
will do and report back! thanks
@<1523701435869433856:profile|SmugDolphin23> only set max_worker=1 and it seems to work. thanks!
wow i had the same problem, this should go into the FAQ AgitatedDove14
ok scratch that - you can override TMPDIR in the env. much better!
SweetBadger76 , AgitatedDove14 , creating a dataset with parents worked very well and produced great visuals on the UI!
AgitatedDove14 I tried the squash solution, however this somehow caused a download of all the datasets into my /tmp folder, filling up the instance? I have a special drive for .clearml cache, how can I tell clearml-data to only use that?
super makes sense, but can it NOT use /tmp for this iām merging about 100GB of files and it is quite heavy on the partition. maybe I could put an env variable to divert it to scratch?
Yeah the hack would work but iām trying to use it form the command line to put in airflow. Iāll post on GH
used Nvidia pytorch container 22.04 instead of the default one, tried to put also jupyterlab (opened up the default ports on azure console). task seems successful, sill no ssh tunnel.
AgitatedDove14 CostlyOstrich36 yes! that did the trick. I added the 10022 on the azure networking pane and session is now working!!
CostlyOstrich36 seeing an awful lot
DEBUG:urllib3.connectionpool:Resetting dropped connection: api.clear.ml
ps, the agent is in docker mode, I wonder why it uses the host mapping for the clearml cache folder
Okay that was because it wasnāt on docker mode for this reproduction
okay I was prematurely happy. will update soon
CostlyOstrich36 I ran using the deafult docker, still a tunell problem. this is what I got eventually:
` Creating config file /etc/ssh/sshd_config with new version
Creating SSH2 RSA key; this may take some time ...
2048 SHA256:TTE+YCJmi2NOpH/ykzdHiP+MgCfKkZXocwUyu58GuAA root@Merlin-dev (RSA)
Creating SSH2 ECDSA key; this may take some time ...
256 SHA256:ks6yr6FpKp5pyLU9NRLK/K96BYieuivwqw7RKAaQHIA root@Merlin-dev (ECDSA)
Creating SSH2 ED25519 key; this may take some time ...
256 SHA256:0JxV...
why are you not starting threads from user issues, is beyond me. anyways iirc it can also happen if you are using the same virtualenv for two trains-agents [mistakenly] and one of them uninstalls certifi
hmmā¦ ReassuredOwl55 a lot has changed in datasets internals since then. please refer to the docs and videos to see how many exciting features were added.
FWIW I think it will turn out okay once you finish uploading the state file